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The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2670 

5 A DNA sequence (GASx971) was identified in S.pyogenes <SEQ ID 7839> which encodes the amino acid 
sequence <SEQ ID 7840>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have a cleavable N-term signal seq. 

10 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

20 Example 2671 

A DNA sequence (GASx972) was identified in S.pyogenes <SEQ ID 7841> which encodes the amino acid 
sequence <SEQ ID 7842>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

25 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3226 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2672 

A DNA sequence (GASx973) was identified in S.pyogenes <SEQ ID 7843> which encodes the amino acid 
sequence <SEQ ID 7844>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty^O. 1830 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



35 



40 



45 
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No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

5 Example 2673 

A DNA sequence (GASx975) was identified in S.pyogenes <SEQ ID 7845> which encodes the amino acid 
sequence <SEQ ID 7846>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

10 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4757 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB07248 GB:AP001519 unknown [Bacillus halodurans] 
20 Identities = 46/134 (34%) , Positives = 73/134 (54%) 

Query: 23 KQPQDEKK^TDADVDAIIDKKFAKWKSEQEAEKSEAKKMAKMNEKEKADYEKQKLLDELQ 82 

K + E+ +T +V+ 1+ + A+ ++E EA+K+AKMN ++K +YE +KL E + 
Sbjct: 66 KPNKTERLFTQEEVNRIVKDRIJARALKI)KEEAIKEAEKLAKM^IAEQKREyELEKLRRENE 125 

25 

Query: 83 ELKM)KTRNELTAVARQMFAESEINVNDDVLGLVVT^ 142 

+LK + R EL A +M E+ I +DDVL W DAEQT+ V T + K+ 
Sbjct: 126 QLKKAQMRYELGREATKMLGEAGIMADDDVLSFVVRDDAEQTQEAVKTFISLVDKIADMR 185 

30 Query: 143 RKALVRQTTPSTGG 156 

K ++ P G 
Sbjct: 186 MKEKLKGRPPKKDG 199 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
35 antigens for vaccines or diagnostics. 

Example 2674 

A DNA sequence (GASx976) was identified in S.pyogenes <SEQ ID 7847> which encodes the amino acid 
sequence <SEQ ID 7848>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

40 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2478 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

50 >GP:AAC79545 GB.-U88974 ORF30 [Streptococcus thermophilus temperate 

bacteriophage 012 05] 
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Identities = 43/119 (36%) , Positives = 66/119 (55%) , Gaps = 16/119 (13%) 

Query: 9 SKEILHNLDYEAISVTLDSNKIG KKWPAGTIIAGKDKSIFEDRKQKVETVTNEE 63 

+ 1+ +L Y+A+S T+DS+ G KK + AGT++AG S1F+DR + V 
Sbjct: 9 TSNIVRSLPYKAVSATVDSSYPGVLVDGKKYIKAGTLVAGNGGSIFDDRTKSV 61 

Query: 64 VSTKEYVDGILLTDVIJLTOGDAVGSCVYRGTINADKLADSSVAENYDDLEEVLPHIVFI 122 

V K +GI+L DVDLT + V S +Y G + DK+ + D +++ LP + FI 
Sbjct: 62 v™KTEPEGIVIiYDvDLTIDNW-SVLYAGEVYKDK™GGDIT---DWKKALPLVKFI 116 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2675 

A DNA sequence (GASx978) was identified in S.pyogenes <SEQ ID 7849> which encodes the ammo acid 
15 sequence <SEQ ID 7850>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>» Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0. 4238 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC79546 GB:U88974 ORF31 [Streptococcus thermophilus temperate 
bacteriophage O1205] 
Identities = 195/343 (56%), Positives = 256/343 (73%), Gaps = 1/343 (0%) 

30 

Query: 1 MALIHEIITSENIKGFYNAKNENVENTLGEKAFPPKQQLGLKLSFIKGAAGKPVTLKAAA 60 

M LI++ +T+ NI G++NA ENV +TLGE FP ++QLG KLS+IKGA+G+ V LKAAA 
Sbjct: 1 MGLIYDKVTASNIAGYFNALQENVSSTLGESIFPARKQLGTKLSYIKGASGQSVALKAAA 60 

35 Query: 61 FDTKVPLRDRMAVELIDEEMPFFKFAMLVKEADRQQIM^IAQTKNNELIDTILASIYNDQ 120 

FDT V +RDR++ E+ DE+MPFFKEAMLVKE DRQQLN++ + N L++TI+A I+ND 
Sbjct: 61 FDTNVTIRDRVSAEMHDEQMPFFKFAMLVKFJSTORQQLNLVKDSGNAVIjVNTIVAGIFNDN 120 

Query: 121 ATLIAGAKARLEAMRMEVLSKGKIHIQSNGVMKDIDYGLAEDQTTKPDAKWDSAGTATPL 180 
40 TL+ GA+ARLEAMRM+VL+ GKI S+GV KDIDYG+ D + W G ATPL 

Sbjct: 121 LTLWGARARLFAMRMQVLATGKIAFTSDGVNKDIDYGVKPDHKKQVSKSWAEPG-ATPL 179 

Query: 181 KDIEKAIEKMAERGFVPFAII^SKTFSLIKNAESTLDWKPMAPNGAAVTKRDUTTYLE 240 
D+E AIE E G PE +MN+KTF LI+ A ST+ V+KP+A +G+AVTK +L Y+ 
45 Sbjct: 180 ADLEDAIETARELGLNPERAVMNAKTFGLIRKAASTVKVIKPLAGDGSAVTKAELENYIA 239 

Query: 241 DELQIKVILKDGMFVGDDGESRKYFPDGFATLVPNGNLGYTVFGTTPEQSDLLGGEATDA 300 

D + ++L++G + D GE K++PDG TL+PNG LG TVFGTTPE+SDL +A 
Sbjct: 240 DNFGVSIVIjENGTYRNDKGEVSKFYPDGHLTLIPNGPLGNTVFGTTPEESDLFADNTVNA 299 



50 



Query: 301 NVSIVETGIAITTTKTTDPVNVQTKVSMIALPSFERLEEVHII 343 

V IV+ GIA+TTTKTTDPVNVQTKVSM+ALPSFERL++V+++ 
Sbjct: 300 EVEIVDNGIAvTTTKTTDPvNVQTKVSMVALPSFERLDDVYML 342 



55 



Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2676 

A DNA sequence (GASx979) was identified in S.pyogenes <SEQ ID 7851> which encodes the amino acid 
sequence <SEQ ID 7852>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

5 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3319 (Affirmative) < suco 

10 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2677 

A DNA sequence (GASx980) was identified in S.pyogenes <SEQ ID 7853> which encodes the amino acid 
sequence <SEQ ID 7854>. Analysis of this protein sequence reveals the following: 

20 Possible site: 55 

>>> Seems to have no N- terminal signal sequence 

Final Results 

25 bacterial cytoplasm — Certainty=0. 2385 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
30 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC34404 GB:AF158600 gpll3 [Streptococcus thermophilus 
bacteriophage Sfill] 
Identities = 53/109 (48%) , Positives = 79/109 (71%) , Gaps = 4/109 (3%) 

35 Query: 11 IVKNVKLDLGIEDDNQDQLLEMLLNRITDHFKANYGVLEIDNAFSFVLEDCLIARFNRRG 70 
+++NV +DL I DDN LL +LI> RI +HFKA YGV E+D+ +F+ EDCL+ RFNRRG 
Sbjct: 9 VIQNVSVDLNINDDN LLGILLERIvNHFKAEYGVDEVDDNLAFIFEDCLVKRFNRRG 65 

Query: 71 SERAKTEEVEGHKTTYYDHLNEFEPYDAM I MAKLNL I KDKSRKGGLYFL 119 
40 +E A++E ++GH +YYD+ NEF+PYD M+ +L ++++G + FL 

Sbjct: 66 AEGARSESIDGHSMSYYDNENEFDPYDNMLQ-RLYGTSGQAKEGEVLFL 113 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

45 Example 2678 

A DNA sequence (GASx981) was identified in S.pyogenes <SEQ ID 7855> which encodes the amino acid 
sequence <SEQ ID 7856>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
50 »> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 5714 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

5 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA59188 GB:X84706 b3 [Bacteriophage Bl] 
Identities = 28/82 (34%) , Positives = 49/82 (59%) , Gaps = 2/82 (2%) 

10 

Query: 1 MRYADRVTFVKTT-DEQYNPDLGEYTHTEVISITKPCFVMDMGMEKSVQIFGDYQKDRKV 59 

+RY D VTF+K + D Y+PDLGE+ E + D+G ++SV++FGD +K KV 

Sbjct: 1 LRYLDEVTFIKESPDSHYDPDLGEWVEKEPTRTVFSANITDIGTDRSVEVFGDIKKGAKV 60 

15 Query: 60 I YLKQPYT - KAFDYCEYEGRRY 80 

+ + + +DY E++ +++ 
Sbjct: 61 MRMMPLFNMPKYDYIEFDNKKW 82 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
20 antigens for vaccines or diagnostics. 

Example 2679 

A DNA sequence (GASx982) was identified in S.pyogenes <SEQ ID 7857> which encodes the amino acid 
sequence <SEQ ID 7858>. Analysis of this protein sequence reveals the following: 
Possible site: 14 

25 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2509 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

35 >GP:AAC34406 GB:AF158600 gpll4 [Streptococcus thermophilus 

bacteriophage Sfill] 
Identities = 44/103 (42%) , Positives = 65/103 (62%) , Gaps = 5/103 (4%) 

Query: 17 GLKKKLELIIKKDAVKK IVRDNGTQLQRKMINKAVFTKGYSTGATRRSITMQIGDGG 73 

40 GL + + ++K + +K ++R G++L+ +N+A F KGYSTGATRRS IT+Q+ 

Sbjct: 8 GLDE^QSLLKNASPEKRSKVLRKYGSKLKEAAVNRAQFNKGYSTGATRRSITLQVESDK 67 

Query: 74 LSVKVKPGTHYAGYLERGTRLMSKQPFVLPALKEQKVKFRKDL 116 
+V+ T Y+GYLE GTR M QPF+ PAL E K ++L 
45 Sbjct: 68 ATVEAL- -TSYSGYLEVGTRKMEAQPFMKPALDEVAPKMVEEL 108 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2680 

50 A DNA sequence (GASx983) was identified in S.pyogenes <SEQ ID 7859> which encodes the amino acid 
sequence <SEQ ID 7860>. Analysis of this protein sequence reveals the following: 

Possible site: 45 
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>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3098 (Affirmative) < suco 

5 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

10 >GP:AAA32612 GB:L31366 putative [Bacteriophage Tuc2009] 

Identities = 88/129 (68%) , Positives = 108/129 (83%) 

Query: 1 MIKTRDQSIFDEMFKRIQSLGFKVYDYKPMTKVPYPFVEMESTDAEYIPNKDDIKGSVEL 60 
MIKTRDQS I FDE+FKRIQ+LG+ VYDYKPM EV YPFVE+E+T + NK DIKG+V L 
15 Sbjct: 1 MIKTRDQSIFDELFKRIQALGYTVYDYKPMNEVGYPFVELENTQTIHEANKTDIKGTVSL 60 

Query: 61 MLSVWGVQKKRKQVSDMASAI FSQALTVESSDVFRWSLNTRQSS IQMLDDTTTVTPLKRA 120 

LSVWG+QKKRK+VSDMAS IF+QAL + ++D + W+LN++ S + 1 QMLDDTTT TPLKRA 
Sbjct: 61 SLSWGLQKKRKEVSDMASNIFNQALNISATDGYSWALNSQASTIQMLDDTTTHTPLKRA 120 

20 

Query: 121 IVTLRFNLR 129 

++ L F LR 
Sbjct: 121 LINLEFRLR 129 

25 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2681 

A DNA sequence (GASx984R) was identified in S.pyogenes <SEQ ID 786 1> which encodes the amino acid 
sequence <SEQ ID 7862>. Analysis of this protein sequence reveals the following: 

30 Possible site: 36 

>» Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 1736 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

40 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2682 

A DNA sequence (GASx985) was identified in S.pyogenes <SEQ ID 7863> which encodes the amino acid 
45 sequence <SEQ ID 7864>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certai»ty=0 . 0000 (Not Clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 
The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA32613 GB:L31366 structural protein [Bacteriophage Tuc2009] 
Identities = 81/185 (43%) , Positives = 111/185 (59%) , Gaps = 22/185 (11%) 

Query: 4 QLEAKQGIHSILLFRLLKEASSEAATKLAFQTEHEVGKSRDVDGQKTKDGIIQSVGALEY 63 

+L AKQG ILL+RLL +A+ EAA KLAFQTEH K+RD + TKDG I S+ A+EY 
Sbjct: 3 ELTAKQGKDIILLYRLLSKATKEAAWKLAFQTEHSNEKTRDYNTTATKDGTIGSLAAIEY 62 

Query: 64 DFKATSIIAKGDVLAAKLEKAMENGEIiWIVroiDLEETSKNGDSDNKIJUSrVWGIDKNGTN 123 

ATSI A GD +++KA ++GE++++W+ID E 
Sbjct: 63 SLSATS I AANGDPHLDEMDKAFDDGE I IDVWEIDKAEKG 101 

15 Query: 124 RGNGKYIATYYQGYISSFSAKKNAEENIEIEMEFAINGVGQKGFATLTDAQKAAVQYAFK 183 

+GKY A Y + Y++SFS + N+E+ +E+ +EF + G QKG ATLT+ Q VQY FK 
Sbjct: 102 -SDGKYKAKYLRAYLTSFSYEPNSEDALELSLEFGVFGKPQKGQATLTEEQANWQYVFK 160 

Query: 184 DTTKG 188 
20 DT G 

Sbjct: 161 DOTAG 165 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 



25 Example 2683 

A DNA sequence (GASx986) was identified in S.pyogenes <SEQ ID 7865> which encodes the amino acid 
sequence <SEQ ID 7866>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

30 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2273 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA59192 GB:X84706 a2 [Bacteriophage Bl] 
40 Identities = 54/111 (48%), Positives = 72/111 (64%), Gaps = l/lll (0%) 

Query: 1 MQLEIKGKTHNVKFGTRFVAEMDKNHIAERQGFKFGAGLQSSV-PFLIDHSWTLAEVIY 59 

M+L IKGK + KFG +FV E+DKN + E+ G FG L + P L ++ TL+ V++ 
Sbjct: 1 MELTIKGKQvHFKFGVKFWELDKNIiVIEQNGVSFGLALAVKIIPELEMANIATLSNVLF 60 

45 

Query: 60 TGTITEPPRPSLNDIYDYIDEVEDIEKLFDDVLDELRQSNASKLFMAQVEK 110 

G TE P+ S DI D+IDE EDIEKLFDDVL E+ +SN KL A++ K 
Sbjct: 61 LGNRTETPKLSQGDIDDFIDECEDIEKLFDDVLKEITESNTGKLIKAKMTK 111 

50 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 



Example 2684 

A DNA sequence (GASx987) was identified in S.pyogenes <SEQ ID 7867> which encodes the amino acid 
sequence <SEQ ID 7868>. Analysis of this protein sequence reveals the following: 



45 
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Possible site: 36 

»> Seems to have no N-terminal signal sequence 



5 Final Results 

bacterial cytoplasm Certainty=0 .2735 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

10 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA59193 GB:X84706 c2 [Bacteriophage Bl] 
Identities = 40/111 (36%) , Positives = 57/111 (51%) , Gaps = 10/111 (9%) 

15 Query: 2 IVLNCIRYLGMTD1NEIGRLTLYEYDLLMTGKALAAVDESHKAHKQAWINHQVTATKLVG 61 

+++ +R G+ D++ R+T+ EY + L +DE ++QAW N QV ATK G 

Sbjct: 15 MMIRFLRCFGIQDLSVFERMTIREYSIRSIAFQLRTLDEEEFIYEQAWANWQVQATKQQG 74 

Query: 62 GKKNKKEVPVYKKFKDFFD YEEEIRKI -TQEIDEGYDKKGMDLLLKAN 108 

20 K P+Y FK FFD E EI I + E D K +DL+ KAN 

Sbjct: 75 KK PLYPTFKKFFDKKKLENEILGIESPENKFKKDNKLIDLMKKAN 119 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

25 Example 2685 

A DNA sequence (GASx989) was identified in S.pyogenes <SEQ ID 7869> which encodes the amino acid 
sequence <SEQ ID 7870>. Analysis of this protein sequence reveals the following: 
Possible site: 60 

30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2869 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA66560 GB:X97918 gene 19.1 [Bacteriophage SPP1] 
40 Identities = 66/232 (28%) , Positives = 106/232 (45%) , Gaps = 12/232 (5%) 



Query: 38 FRTLTVSGRDVVDLEHQTTS VLGRNGE YFHNATVEVRKLE I KAKI SGKDNKS - MRLQYEK 96 

F V GR V +E ++ G +G ++ R+LE+ A + G ++ +R + E 

Sbjct: 24 FLVQEVRGRS VYS I EMGKRTI AGVDGGVITTESLPARELE VDAI VFGDGTETDLRRRI EY 83 

Query: 97 LNKLIVSHNQVFLSFSDEPDRNYLGIFKSKDVPEEVSNEQIIGLTFICYNPFK MS 151 

LN L+ V ++FSDEP RYG++ +E +LFC+PK + 

Sbjct: 84 LNFLLHRDTDVPITFSDEPSRTYYGRYEFATEGDEKGGFHKVTLNFYCQDPLKYGPEVTT 143 

50 Query: 152 DVKTKKGTSIQNGGLFQTKPIITIjNLSSPTKEIKLLHVESQKYIRLT GTYTTDEIK 207 

DV T T ++N GL T P I S+ E ++ ++ ++ G T D + 

Sbjct: 144 DV-TTASTPVKNTGIAvTNPTIRCOTSTSATEYEMQLLDGSTVVKFLKVKYGFNTGDTLV 202 



Query: 208 IDMATGKITQNGRNILGDLDMINSRYFELLPGNNTLQCANAAITAEFREVYL 259 
55 ID +T NG++I+ L+IS++LPNT A TFE+L 

Sbjct: 203 IDCHERSVTLNGQDIMPAL-LIQSDWIQLKPQVNTYLKATQPSTIVFTEKFL 253 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2686 

A DNA sequence (GASx990) was identified in S.pyogenes <SEQ ID 7871> which encodes the amino acid 
sequence <SEQ ID 7872>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2861 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04681 GB:AP001510 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 116/449 (25%) , Positives = 198/449 (43%) , Gaps = 79/449 (17%) 



Query: 


2 


IYLFDKlERLVATVG-TDDLLSWHFKVKNNDWDQASFEVPVDYDVEPFvYFGFFNYDPHQ 


60 






+++FD+ ++L+ T+ + L+ F+ + N F ++ E + + HQ 




Sb j ct : 


4 


LFIFDREDQLLTTLTESTGLVRALFREELNRVPNQPFAFTIEASSEEAKHV IEEHQ 


59 


Query: 


61 


KEDVFKLFKVIDYNLEDSKFYEG LDKAESDLDTIAI IKDKRFRQSSADA 


109 






KE +LF + + LED G + A +L I++ Q + +A 




Sb j ct : 


60 


WFRDKEGDLRLFVIKE - - LEDVDGLDGPQTTAI CEPAFMELAEHMI VEQS WNQPAHEA 


117 


Query: 


110 


CIDGALEGTGYQVGKVEGITNVRTLSYYYISPRAALIKIVEAFNCEFNVRYTF-1NNKIT 


168 






++ AL+GT + G VE T + Y+S A+ 1+ + +F TF N+IT 




Sb j ct : 


118 


-IiWALQGTRW-TGSVEvNLGNATEHFSWSAIEAvTOILVTWGGDFKDVVTFNAENRIT 


175 


Query: 


169 


SRYIDIiKKRFGKPTGKQFEHGNNLLKVVYEESTDDI VTCLIGRGKGEEIQHEEAEPKDVE 


228 






S I + +R G GK+FE +N+ + + VT L GRG +Q E E + 




Sbjct: 


176 


SHQIKI VQRRGVDRGKRFEIDHNI -EQIERTILSYPVTALYGRGAS- -LQGENGE D 


228 


Query: 


229 


GHLPQEERRQGYGRRIEFTDWWSVEKGDPIDKPAGQNFVALDSAREEYGLSQNGELKHR 


288 






G L +F +V W G P+DKP GQ +V A ++YG NG+L HR 




Sbjct: 


229 


GSL DFGEVEWRKSAGAPVDKPKGQLWVGDPEALQKYGRKHNGQLLHR 


275 


Query: 


289 


WGVFVNEEIEDKTELLKATWEELQRLSIPIRIYKAEILDIGPETWKGDSVAIIYDEVKIA 


348 






G+F N I ED ELL+ TWE+LQ+ S P Y+ + +++ + 




Sbjct: 


276 


EGI FQNTNIEDPEELLEKTWEQLQKSSKPE VHYRLSVR LFEHIS - - 


319 


Query: 


349 


FETRvDEIDIDKLNFNRSWTLGDYSWQNR ESRSRKEAVQ-NMIDESLETITD 


401 






+ +LGD ++ +R E +SR A++ +++D + + 




Sb j ct : 


320 


GYEHEQASLGDTAIAIDRQFSRPIEIQSRIIAIEYDLVDIDGTGMVE 


366 


Query: 


402 


LGMTFQEFLQGIEKRIETGKKEMEDNWRK 43 0 








+G L G+++R+E +E+E N K 




Sbjct: 


367 


MGQFLS--LNGMDERLERIIEEIEKNQGK 393 





Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2687 



A DNA sequence (GASx991) was identified in S.pyogenes <SEQ ID 7873> which encodes the amino acid 
sequence <SEQ ID 7874>. Analysis of this protein sequence reveals the following: 
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Possible site: 50 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2584 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA98101 GB:M19348 hyaluronidase [Streptococcus pyogenes phage 
H4489A] 

Identities = 314/371 (84%), Positives = 338/371 (90%), Gaps = 1/371 (0%) 



Query: 


1 


MAENIPLRVQFKRMKAAEWASSDVVLLEGEIGFETDTGFAKFGDGQNTFSKLKYLTGPKG 


60 






M ENIPLRVQFKRM A EWA SDV+LLEGEIGFETDTGFAKFGDGQNTFSKLKYLTGPKG 




Sbjct: 


1 


MTENIPLRVQFKRMSADEWARSDVILLEGEIGFETDTGFAKFGDGQNTFSKLKYLTGPKG 


60 


Query: 


61 


PKGDTGLQGKTGGTGSRGPAGKPGTTDYDQLQNKPDLGAFAQKEETNSKITKLESSKADK 


120 






PKGDTGLQGKTGGTG RGPAGKPGTTDYDQLQNKPDLGAFAQKEETNSKITKLESSKADK 




Sbjct: 


61 


PKGDTGLQGKTGGTGPRGPAGKPGTTDYDQLQNKPDLGAFAQKEETNSKITKLESSKADK 


120 


Query: 


121 


NAVYLKAESNAKLDEKLNIjKGGvMTGQLQFKPN-SGIKPSSSVGGAINIDMSKSEGAAMV 


179 






+AVY KAES +LD+KL+L GG++TGQLQFKPN SGIKPSSSVGGAINIDMSKSEGAAMV 




Sbjct: 


121 


SAVYSKAESKIELDKKLSLTGGIVTGQLQFKPNKSGIKPSSSVGGAINIDMSKSEGAAMV 


180 


Query: 


180 


MYTNKDTTDGPLMILRSNKDTFDQSVQFVDYKGTTI^VNIWIRQPTTPNFSSAIiNITSAN 


239 






MYTNKDTTDGPLMILRS+KDTFDQS QFVDY G 1WAVNIVMRQP+ PNFSSALNITSAN 




Sb j ct : 


181 


MYTNKDTTDGPLMILRSDKDTFDQSAQFVDYSGKTNAVNIVMRQPSAPNFSSALNITSAN 


240 


Query: 


240 


EGGSAMQIRGVEKALGTLKITHENPSVDKEyDKNRAALSIDIVKKQKGGKGTAftQGIYIN 


299 






EGGSAMQIRGVEKALGTLKITHENP+V+ +YD+NAAALSIDIVKKQKGGKGTAAQGIYIN 




Sb j ct : 


241 


EGGSAMQIRGVEKALGTLK1THENPNVEAKYDENAAALSIDIVKKQKGGKGTAAQGIYIN 


300 


Query: 


300 


STSGTTGKLLRIRNLNDDKFYVKPDGGFYAKETSQIDGNLKLKDPIANDHAATKAYVDGE 


359 






STSGT GK+LRIRN N+DKFYV PDGGF++ S + GNL +KDP + HAATK YVD + 




Sbjct: 


301 


STSGTAGKMLRIRNKNEDKFWGPDGGFHSGANSTVAGNLTVKDPTSGKHAATKDYVDEK 


360 


Query: 


360 


VEKLKALIAAK 370 








+ +LK L+ K 




Sb j ct : 


361 


IAELKKLILKK 371 





Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2688 

A DNA sequence (GASx993) was identified in S.pyogenes <SEQ ID 7875> which encodes the amino acid 
sequence <SEQ ID 7876>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1358 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2689 

A DNA sequence (GASx995) was identified in S.pyogenes <SEQ ID 7877> which encodes the amino acid 
sequence <SEQ ID 7878>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0855 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC34418 GB:AF158600 gpl49 [Streptococcus thermophilus 
bacteriophage Sfill] 
Identities = 27/95 (28%) , Positives = 50/95 (52%) , Gaps = 2/95 (2%) 

Query: 9 KYPQLDGTGAVASTHI I IAAEDGAVI PQLI KQDLTSTNDTE 1 1 KAALEEFKKSEYVE I AM 68 

K + D +GA +T +1+ DGA +P + + ++TE++K ALE + + + A 
Sbjct: 26 KSKEYDASGAAYATKVILKNRDGAYVPVFLPVEKIDLSNTELLKEALEVIYQENFPQRAE 85 

Query: 69 GEAVQKVDDLEKISQETAKTAKTAQTAAGLAKVSA 103 

E ++D EKI + A+K +TA++S+ 
Sbjct: 86 NEKFNELD--EKIKEYEALSKKATETIAKMEEASS 118 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2690 

A DNA sequence (GASx996) was identified in S.pyogenes <SEQ ID 7879> which encodes the amino acid 
sequence <SEQ ID 7880>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -4.62 Transmembrane 9 - 25 ( 7 - 26) 

Final Results 

bacterial membrane Certainty=0 .2848 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2691 

A DNA sequence (GASx997) was identified in S.pyogenes <SEQ ID 788 1> which encodes the amino acid 
sequence <SEQ ID 7882>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

5 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.66 Transmembrane 38 - 54 ( 35 - 55) 

Final Results 

10 bacterial membrane Certainty=0. 24 6 6 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

15 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2692 

A DNA sequence (GASx998R) was identified in S.pyogenes <SEQ ID 7883> which encodes the amino acid 
20 sequence <SEQ ID 7884>. Analysis of this protein sequence reveals the following: 
Possible site: 27 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.87 Transmembrane 47 - 63 ( 41 - 72) 

25 

Final Results 

bacterial membrane Certainty=0. 4949 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

30 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

35 Example 2693 

A DNA sequence (GASx999) was identified in S.pyogenes <SEQ ID 7885> which encodes the amino acid 
sequence <SEQ ID 7886>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

40 »> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2694 

A DNA sequence (GASxlOOl) was identified in S.pyogenes <SEQ ID 7887> which encodes the amino acid 
5 sequence <SEQ ID 7888>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.51 Transmembrane 18 - 34 ( 16 - 34) 

10 

Final Results 

bacterial membrane Certainty=0 . 5203 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

20 Example 2695 

A DNA sequence (GASxl002) was identified in S.pyogenes <SEQ ID 7889> which encodes the amino acid 
sequence <SEQ ID 7890>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

25 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -3.61 Transmembrane 12 - 28 ( 11 - 33) 

Final Results 

bacterial membrane Certainty=0 . 2444 (Affirmative) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein is similar to AF186180 from S.equi. 

35 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2696 

A DNA sequence (GASxl003) was identified in S.pyogenes <SEQ ID 789 1> which encodes the amino acid 
sequence <SEQ ID 7892>. Analysis of this protein sequence reveals the following: 

40 Possible site: 32 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

45 bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 
The protein is similar to SeeH from S.equi: 

>GP:AAF72809 GB:AF186180 SeeH [Streptococcus equi] Length = 236 

Identities = 233/236 (98%) , Positives = 234/236 (98%) 

5 

Query: 1 MRYNCRYSHIDKKIYSMIICLSFLLYSNWQANSYNTTNRHNLESLYKHDSNLIEADSIK 60 

MRYNCRYSHIDKKIYSMIICLSFLLYSlWVQflNSYNTTNRHNLESLYKHDSNLIEADSIK 
Sbjct: 1 MRYNCRYSHIDKKIYSMIICLSFLIiYSNVVQANSYNTTNRHl^ESLYKHDSISrLIEADSIK 60 

10 Query: 61 NSPDIOTSHMLKYSVKDKNLSVFFEKDWISQEFKDKEVDIYALSAQEVCECPGKRYEAFG 120 

NSPDIVTSHMLKYSVKDKNLSVFFEKDWISQEFKDKEVDIYALSAQE CECPGKRYEAFG 
Sbjct: 61 NSPDIVTSHMLKYSVKDKNLSVFFEKDWISQEFKDKEVDIYALSAQEACECPGKRYEAFG 120 

Query: 121 GITLTMSEKKEIK^PVNVWDKSKQQPPMFITVMPKOTAQEVDIKTOKLLIKKYDIYmiR 180 
15 GITLTNSEKKEIKVP+NVWDKSKQ PPMFITWKPKVTAQEVDIKVRKLLIKKYDIYNNR 

Sbjct: 121 GITLTNSEKKEIKVPINVWDKSKQHPPMFITVNKPKVTAQEVDIKVRKLLIKKYDIYMKR 180 

Query: 181 EQKYSKGTVTLDLNSGKDI VFDLYYFGNGDFNSMLKI YSNNERIDSTQFHVDVS I S 236 
EQKYSKGTVTLDLNSGKDI VFDLYYFGNGDFNSMLKI YSNNERIDSTQFHVDVS I S 
20 Sbjct: 181 EQKYSKGTVTLDLNSGKDI VFDLYYFGNGDFNSMLKI YSNNERIDSTQFHVDVS I S 236 

There is also homology to a S.aureus enterotoxin: 

>GP:AAA19777 GB:U11702 enterotoxin H [Staphylococcus aureus] 
Identities = 70/215 (32%) , Positives = 108/215 (49%) , Gaps = 19/215 (8%) 

Query: 27 SNWQANSYNTTNRHNLESLYKHDSNLIEADSI - KNSPDIVTSHMLKYSVKDKNLSVFFE 85 

+++ AN+Y N ++ KD EDI+ND +K++ D 
Sbjct: 34 TDLALANAYGQYNHPFIKENIK3DEISGEKDLIFRNQGDSGNDLRVKFATAD 85 

30 Query: 86 KDWISQEFKDKEVDIYALSAQEVCECPGKRYEA--FGGITLTNSEK--KEIKVPVNVWDK 141 

++Q+FK+K VDIY S CE + +GG TL NSEK +E + NVW 

Sbjct: 86 IAQKFKNKNVDIYGASFYYKCEKISENISECLYGGTTL-NSEKLAQERVIGANVWVD 141 

Query: 142 S KQQPPMFI TVNKPKVTAQEVD I KVRKLL I KKYD I YNNREQKYSKGTVTLDLNSGKD I VF 201 
35 Q+ I NK VT QE+DIK+RK+L KY IY ++ + SKG + D+ + +D F 

Sbjct: 142 GIQKETELIRTNKKNVTLQELDIKIRKILSDKYKIY-YKDSEISKGLIEFDMKTPRDYSF 200 

Query: 202 DLYYFGNGDFNSMLKIYSNNERIDSTQF-HVDVSI 235 
D+Y + + KIY +N+ + S H+DV++ 

40 Sbjct: 201 DIYDLKGENDYEIDKIYEDNKTLKSDDISHIDVNL 235 

>GP:AAC26661 GB:AF064774 extracellular enterotoxin type I precursor 
[Staphylococcus aureus] 
Identities = 68/214 (31%) , Positives = 109/214 (50%) , Gaps = 27/214 (12%) 

45 

Query: 42 NLESLY-KHDSNLIEADSIKNSPDIVTSHMLKYSVKDKNLSVFFEKDWIS-QEFKDKEVD 99 

NL + Y KHD ++ + KN P ++ L++S +L + +W +FK K++D 
Sbjct: 32 NLRNFYTKHDYIDLKGVTDKNLP IANQLEFSTGTNDL- ISESNNWDEISKFKGKKLD 87 

50 Query: 100 IYALSAQEVCECPGKRYEAFGGITLTNSEKKEI -KVPVNVWDKSKQQPPMF- - ITVNKPK 156 
1+ + C K +GG TL+ K+P+N+W K + I NK 

Sbjct: 88 IFGIDYNGPC KSKYMYGGATLSGQYLNSARKIPINLWVNGKHKTISTDKIATNKKL 143 

Query: 157 VTAQEVDIKVRKLLIKKYDIYNNRE QKYSKGTVTLDLNSGKDIVFD 202 

55 VTAQE+D+K+R+ L ++Y+IY + ++ G V LN+ K +D 

Sbjct: 144 VTAQEIDVKLRRYLQEEYNIYGHNNTGKGKEYGYKSKFYSGFNNGKVLFHLNNEKSFSYD 203 

Query: 203 LYYFGNGDFNSMLKIYSNNERIDSTQFHVDVSIS 236 
L+Y G+G S LKIY +N+ I+S +FH+DV IS 
60 Sbjct: 204 LFYTGDGLPVSFLKIYEDNKIIESEKFHLDVEIS 237 

>GP:AAC28968 GB:U93688 enterotoxin [Staphylococcus aureus] 
Identities = 70/244 (28%) , Positives = 127/244 (51%) , Gaps = 27/244 (11%) 



65 



Query: 12 KKIYSMIICLSFLLYSNWQANSYNTTNRHNLESLYKHDSNLIEADSIKNSPDIVTSHML 71 
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KK+ S+++ ++ ++ A++ NL + Y + ++ +K++ D ++ L 

Sbjct: 2 KKLISILL-INIIILGVSNNASAQGDIGIDmiENFTrK-KDFVDLKDVKDN-DTPIANQL 58 

Query: 72 KYSVKDKNLSVFFEKDWIS-QEFKDKEVDIYALSAQEVCECPGKRYEAFG6ITLTNSE-K 129 
5 ++S + +L + KD+ FK K++D++ +S C +Y +GG+T TN 

Sbjct: 59 QFSNESYDL- ISESKDFNKFSNFKGKKLDVFGISYNGQCNT KY- IYGGVTATNEYLD 113 

Query: 130 KEIKVPVNVW- -DKSKQQPPMFITVNKPKVTAQEVDIKVRKLLIKJCYDIYHNREQK 183 

K +P+N+W K ++ NK VTAQE+D+K+RK L ++Y+IY + K 

10 Sbjct: 114 KSRNIPINIWINGNHKTISTOKVSTNKKLVTAQEIDVKLRKYLQEEYNIYGHNGTKKGEE 173 

Query: 184 YSKGTVTLDIiNSGKDIVFDLYYFG-NGDFNSMLKIYSNNERIDSTQFHVD 232 

++ G VT LN+ +DL+Y G +G S LKIY +N+ ++S +FH+D 

Sbjct: 174 YGHKSKFYSGFNIGKVTFHLNNNDTFSYDLFYTGDDGLPKSFLKIYEDNKTVESEKFHLD 233 



15 



45 



50 



Query: 233 VSIS 236 
V IS 

Sbjct: 234 VDIS 237 



20 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2697 

A DNA sequence (GASxl004R) was identified in S.pyogenes <SEQ ID 7893> which encodes the amino 
acid sequence <SEQ ID 7894>. Analysis of this protein sequence reveals the following: 

25 Possible site: 29 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.18 Transmembrane 12 - 28 ( 12 - 28) 

30 Final Results 

bacterial membrane Certainty=0. 1871 (Affirmative) < suoo 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2698 

40 A DNA sequence (GASxl009) was identified in S.pyogenes <SEQ ID 7895> which encodes the amino acid 
sequence <SEQ ID 7896>. Analysis of this protein sequence reveals the following: 

Possible site: 34 



>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 6391 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0 0 0 0 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2699 

A DNA sequence (GASxlOl 1) was identified in S.pyogenes <SEQ ID 7897> which encodes the amino acid 
5 sequence <SEQ ID 7898>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>>> Seems to have no N- terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 .4528 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2700 

20 A DNA sequence (GASxl024) was identified in S.pyogenes <SEQ ID 7899> which encodes the amino acid 
sequence <SEQ ID 7900>. Analysis of this protein sequence reveals the following: 
Possible site: 22 

>» Seems to have an uncleavable N-term signal seq 

25 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

30 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

35 Example 2701 

A DNA sequence (GASxl033) was identified in S.pyogenes <SEQ ID 790 1> which encodes the amino acid 
sequence <SEQ ID 7902>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1652 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
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The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2702 

5 A DNA sequence (GASxl039) was identified in S.pyogenes <SEQ ID 7903> which encodes the amino acid 
sequence <SEQ ID 7904>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 
10 INTEGRAL Likelihood = -1.06 Transmembrane 15 - 31 ( 15 - 31) 

Final Results 

bacterial membrane Certainty=0. 1425 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
20 antigens for vaccines or diagnostics. 

Example 2703 

A DNA sequence (GASxl058) was identified in S.pyogenes <SEQ ID 7905> which encodes the amino acid 
sequence <SEQ ID 7906>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

25 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5484 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2704 

A DNA sequence (GASxl077) was identified in S.pyogenes <SEQ ID 7907> which encodes the amino acid 
sequence <SEQ ID 7908>. Analysis of this protein sequence reveals the following: 

40 Possible site: 31 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 4848 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

5 Example 2705 

A DNA sequence (GASxl080) was identified in S.pyogenes <SEQ ID 7909> which encodes the amino acid 
sequence <SEQ ID 7910>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
10 >>> Seems to have an uncleavable N-term signal seq 
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-1 


.44 


Transmembrane 


55 


- 71 


( 


55 


- 72) 



15 

IN 

Final Results 

bacterial membrane Certainty=0 . 5967 (Affirmative) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
25 antigens for vaccines or diagnostics. 

Example 2706 

A DNA sequence (GASxl081) was identified in S.pyogenes <SEQ ID 791 1> which encodes the amino acid 
sequence <SEQ ID 7912>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

30 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-13.00 Transmembrane 103 - 119 ( 91 - 129) 

INTEGRAL Likelihood =-11.46 Transmembrane 208 - 224 ( 203 - 230) 

INTEGRAL Likelihood = -8.28 Transmembrane 54 - 70 ( 46 - 71) 

35 INTEGRAL Likelihood = -5.79 Transmembrane 160 - 176 ( 155 - 181) 

INTEGRAL Likelihood = -4.25 Transmembrane 127 - 143 ( 125 - 149) 

Final Results 

bacterial membrane Certainty=0 . 6201 (Affirmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
45 antigens for vaccines or diagnostics. 

Example 2707 

A DNA sequence (GASxl089) was identified in S.pyogenes <SEQ ID 7913> which encodes the amino acid 
sequence <SEQ ID 7914>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
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>>> Seems to have no N- terminal signal sequence 



Final Results 

5 bacterial cytoplasm Certainty=0. 2999 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

10 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2708 

A DNA sequence (GASxll09) was identified in S.pyogenes <SEQ ID 7915> which encodes the amino acid 
15 sequence <SEQ ID 7916>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have no N- terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 1270 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2709 

30 A DNA sequence (GASxlll4R) was identified in S.pyogenes <SEQ ID 7917> which encodes the amino 
acid sequence <SEQ ID 7918>. Analysis of this protein sequence reveals the following: 

Possible site: 19 



35 



40 



>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4 021 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2710 

A DNA sequence (GASxll49) was identified in S.pyogenes <SEQ ID 791 9> which encodes the amino acid 
sequence <SEQ ID 7920>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

5 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.70 Transmembrane 12 - 28 ( 12 - 29) 

Final Results 

10 bacterial membrane Certainty=0 . 1680 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

15 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2711 

A DNA sequence (GASxll50) was identified in S.pyogenes <SEQ ID 792 1> which encodes the amino acid 
20 sequence <SEQ ID 7922>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have a cleavable N-term signal seq. 

25 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

30 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2712 

35 A DNA sequence (GASxl 160) was identified in S.pyogenes <SEQ ID 7923> which encodes the amino acid 
sequence <SEQ ID 7924>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N-terminal signal sequence 
40 INTEGRAL Likelihood = -3.19 Transmembrane 15 - 31 ( 15 - 31) 

Final Results 

bacterial membrane Certainty=0. 2275 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2713 

A DNA sequence (GASxll67) was identified in S.pyogenes <SEQ ID 7925> which encodes the amino acid 
sequence <SEQ ID 7926>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1404 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB99233 GB:U67563 oxaloacetate decarboxylase alpha chain (oadA) 
[Methanococcus jannaschii] 
Identities = 250/453 (55%) , Positives = 325/453 (71%) , Gaps = 7/453 (1%) 



Query: 


13 


VAITETVLRDGHQSLMATRLSIEDMLPVLTILDKIGYYSLECWGGATFDACIRFIiNEDPW 


72 






V I +T RD QSL+ATR+ EDMLP+ +D++G+YS+E WGGATFDACIR+LNEDPW 




Sb j ct : 


2 


WIVDTTFRDAQQSLIATRMRTEDMLPIAEKMDEVGFYSMEVWGGATFDACIRYLNEDPW 


61 


Query: 


73 


ERLRTLKKGLPNTRLQMLLRGCjNLLGYRHYADDIVDKFISLSAQNGIDVFRIFDAIaNDPR 


132 






ERLR LICK + NT LQMLLRGQNL+GYRHY DDIV+KF+ + +NGID+FRIFDALND R 




Sbjct: 


62 


ERLRALKKRIQNTPLQMLLRGCm>WYRHYPDDIvBKFVIKAHENGIDIFRIFDALNDVR 


121 


Query: 


133 


NIQQALRAVKKTGKEAQLCIAYTTSPTOTLNYYLSLVKELVEMGADSICII<DMAGILTPK 


192 






N++ A++ KK G E Q I YT SPVHT++ Y+ L K+L EMG DSICIKDMAG+LTP 




Sb j ct : 


122 


NMETAI KTAKKVGAEVQGAI CYT I SPVHTIDQYVELAKKLEEMGCDS I C I KDMAGLLTPY 


181 


Query: 


193 


AAKELVSGIKAMTNLPLIVHTHATSGISQMTYLAAVEAGADRIDTALSPFSEGTSQPATE 


252 






ELV +K +LP+ VH+H TSG++ MTYL +EAGAD +D A+SPF+ GTSQP TE 




Sbjct: 


182 


EGYELVTCRLKEEISLPIDVHSHCTSGLAPMTYLICVIEAGADMVDCAISPFAMGTSQPPTE 


241 


Query: 


253 


SMYLALKEASYDITLDETLLEQAANHLRQARQKYLADGILDPSLLFPDPRTLQYQVPGGM 


312 






S+ +ALK YD LD LL + ++ + R+KY + P D R L YQVPGGM 




Sbjct: 


242 


SI WALKGTKYDTGLDLKLLNEIRDYFMKVREKYKM- - LFSPISQI VDARVLVYQVPGGM 


299 


Query: 


313 


LSNMLSQLKQANAESKLEEVLAEVPRWKDLGYPPLTCPLSQMVGTQAAMNVILGKPYQM 


372 






LSN++SQLK+ A K EEVL E+PRVRKDLGYPPLVTP SQ+VGTQA +NV+ + Y++ 




Sb j Ct : 


300 


LSNLVSQLKEQGALDKFEEVLQEIPRTOKDLGYPPLOTPTSQIVGTQAVLNVLTEERYKI 


359 


Query: 


373 


VSKEIKQYLAGDYGKTPAPVNEDLKRSQI - -GSAPVTTNRPADQLSPEFEVLK- -AEVAD 


428 






++ E+ Y+ G YGK PAP+N +L + + G P+T RPAD L PE+E +K AE 




Sbjct: 


360 


ITNEVWYVXGFYGKPPAPINPELLKRVIJ3EGEKPITC-RPADLLPPEWEKVKKEAEEKG 


418 


Query: 


429 


LAQTDEDVLTYALFPSVAKPFLTTKYQTDDVIK 461 








+ + +ED+LTYAL+P +A FL + + + + K 




Sbjct: 


419 


IVKKEEDILTYALYPQIAVKFLRGELKAEPIPK 451 





Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2714 

A DNA sequence (GASxll68) was identified in S.pyogenes <SEQ ID 7927> which encodes the amino acid 
sequence <SEQ ID 7928>. Analysis of this protein sequence reveals the following: 
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Possible site: 38 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.11 Transmembrane 16 - 32 ( 2 - 34) 

5 

Final Results 

bacterial membrane Certainty=0. 3 845 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

15 Example 2715 

A DNA sequence (GASxll70) was identified in S.pyogenes <SEQ ID 7929> which encodes the amino acid 
sequence <SEQ ID 7930>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
20 »> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




-7. 


,06 


Transmembrane 


211 


- 227 


( 208 


- 238) 


INTEGRAL 


Likelihood 




-5. 


,84 


Transmembrane 


117 


- 133 


( 110 


- 136) 


INTEGRAL 


Likelihood 




-5 


.36 


Transmembrane 


256 


- 272 


( 253 


- 274) 


INTEGRAL 


Likelihood 




-4 


.67 


Transmembrane 


44 


- 60 


( 41 


- 64) 


INTEGRAL 


Likelihood 




-4 


,19 


Transmembrane 


287 


- 303 


( 287 


- 306) 


INTEGRAL 


Likelihood 




-3. 


.77 


Transmembrane 


358 


- 374 


( 357 


- 375) 


INTEGRAL 


Likelihood 




-2. 


,18 


Transmembrane 


20 


- 36 


( 16 


- 38) 


INTEGRAL 


Likelihood 




-0. 


.85 


Transmembrane 


90 


- 106 


( 90 


- 106) 


INTEGRAL 


Likelihood 




-0 


.53 


Transmembrane 


165 


- 181 


( 164 


- 181) 



30 

Final Results 

bacterial membrane Certainty=0. 3824 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA05140 GB.-AJ002015 methylmalonyl-CoA decarboxylase, 
beta-subunit [Propionigenium modestum] 
40 Identities = 231/395 (58%) , Positives = 293/395 (73%) , Gaps = 19/395 (4%) 



Query: 


1 


MLDVLNQ^IVQSSGLAHLTVNNLIMICLASFFLYLGIKKEYEPYL^IVPIAFGILLVNLP^'IA 


60 






ML + S+G L + ++IM+ +A FLYL I KE+EP L+VPI+FGILL NLP A 




Sb j ct : 


1 


MLQAILDFYHSTGFYGLNMGSIIMMLVACVFLYLAIAKEFEPLLLVPISFGILLTNLPFA 


60 


Query: 


61 


GLMDHP ANG NPGGLLYYLYKGTSLGIYPPLIFLCLGASTDFG 


102 






G+M P A+G PGGLLYYL++G LGI+PPLIFL +GA TDFG 




Sb j ct : 


61 


GMMAEPLLEVHEKLSASGAHLYTAHTAEPGGLLYYLFQGDHLGI FPPLI FLGVGAMTDFG 


120 


Query: 


103 


PLIANPKTILLGGAAQVGIFLAFFIiAIMLGM-TPQEMSVGIIGGADGPTAIYVTTKLAP 


161 






PLI+NPK++LLG AAQ GIF+ FF AI G+ T QEAAS+GIIGGADGPTAI++++KLAP 




Sbjct: 


121 


PLISNPKSLLLGAAAQFGI FVTFFGAIASGLFTAQEAASIGI IGGADGPTAI FLSSKLAP 


180 


Query: 


162 


DLLSTIAIjAAYSYMALVPIIQPPIIKLLTTKAERQvraffQARTVSQKEKIIFPIMVTIFV 


221 






L+ IA+AAYSYMALVPI IQPPI+ LT++ ER++KM+Q R VS++EKIIFPI+VTI V 




Sb j ct : 


181 


HLMGPIAVAAYSYMALVPIIQPPIMTALTSETERKIKMSQLRLVSKREKIIFPIWTILV 


240 


Query: 


222 


SLLVPSATTLVGCLMLGNLVREIKIVPKIVENLQQvVMFCITIILGLTVGAKANGDLFLS 


281 
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SL+VP A TLVG LMLGNL RE +V ++ + + ++ ITI LG+TVGA A + FL 
Sbjct: 241 SLI VPPAATLVGMLMLGNLFRECGWGRLEDTAKNAIjINI ITI FLGVTVGATATAEAFLK 300 

Query: 282 ATTLKIIALGLIAFAAGTAGGVLMGKVMYYLSGNKVKPMIGAAGVSAVPMAARWQKIGQ 341 
5 TL 1+ LG++AF GT GVL+ K M LS +NP++G+AGVSAVPMAARV Q +GQ 

Sbjct: 301 VETIAILGLGIVAFGIGTGSGVLLAKFMNKLSKEPINPLLGSAGVSAVPMAARVSQWGQ 360 

Query: 342 EEDPSNFLLMHAMGPNVAGVIGSAIASGALLAFFG 376 
+ DP+NFLLMHAMGPNVAGVIGSA+++G LL+ FG 
10 Sbjct: 361 KADPTNFLLMHAMGPNVAGVIGSAVSAGVLLSLFG 395 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2716 

15 A DNA sequence (GASxll71R) was identified in S.pyogenes <SEQ ID 7931> which encodes the amino 
acid sequence <SEQ ID 7932>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>» Seems to have no N-terminal signal sequence 

20 

Final Results 

bacterial cytoplasm Certainty=0 . 0851 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF93965 GB:AE004165 citG protein [Vibrio cholerae] 
Identities = 100/287 (34%) , Positives = 154/287 (52%) , Gaps = 12/287 (4%) 

30 



Query: 


9 


ISQLALKALLYEVSLSPKPGLVDRFDNGAHDDMSFITFIDSMIALSPFFQAYIETGFAYA 6 8 






+ IA A++ EV L+PKPGLVD +NGAH DM TFI S A++P+ +++ G+ A 


Sbjct: 


32 


VGHIAYHAMMLEVHLTPKPGLVDTANNGAHRDMDLNTFIASAEAIAPYLHSFVSAGWESA 91 


Query: 


69 


KEEPLLLFNRLRQLGQKAEETMFC^TQGINTHKGI^FSMABLLGATGAYLARTPHLMTDL 128 






L + LR +G +AE+ MF ATQG+NTHKG+ F + L+ G+ G A 


Sbjct: 


92 


GNPAAQLLSALRPIGIEAEQAMFAATQGVNTHKGMIFILGLICGSVGWLKANQ 144 


Query: 


129 


GRFSKEDTIAICRLWPMTAHLIQTDLGHIiNTKKEFTYGEQLFVTYGIKGPRGEASEGFT 188 






K D I ++ L+ +L + T GE+++ YG+ G RGEA+ G 


Sbjct: 


145 


- - - LKIDAQHIGETIRQACQFLVIDELKAKRDCEPETAGERI YRQYGLTGARGEAASGLA 201 


Query: 


189 


TLTDHALPYFRQMISQN-DPETSQLRLLVYLMSIVEDGNLIHRGGIEAWKGVKAD-MRLL 246 






+ HALP ++ +++ E + L+ LM+ D NL+ RGG+ V+ +LL 


Sb j ct : 


202 


MVMQHALPAYQACLTKGASTEQALTOTLL VIjMANNNDSNLVSRGGLAGLHFVQEQAQQLL 261 


Query: 


247 


LQQDLSTTDLRLALSSYNQCLINQHLSPGGAADLLALTFYFAFLEKL 293 






+ ++ AL++ + LI +HLSPGG+ADLLA T+ L +L 


Sbjct: 


262 


AKGGFLYQEIEQALTALDSVLIEKHLSPGGSADLLRATWLIYELVQL 308 


Based on this 


analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 



antigens for vaccines or diagnostics. 



Example 2717 

A DNA sequence (GASxll72R) was identified in S.pyogenes <SEQ ID 7933> which encodes the amino 
55 acid sequence <SEQ ID 7934>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
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>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certaxnty=0 .2501 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12389 GB:Z99107 similar to transcriptional regulator (GntR 
family) [Bacillus subtilis] 
Identities = 60/205 (29%) , Positives = 99/205 (48%) , Gaps = 3/205 (1%) 



Query: 


19 


PLKIAFYNALKKTI 1LRQI PVGSRINEKEFS IALNI SRTPIRYALGLLSEEHLVEHI PKK 7 8 






P + FYN LKK I G RINE + + + +SR+PIR A+ LL ++ L++ + 


Sb j ct : 


11 


PYYLQFYNQLKKMI FNGTFKPGERINETQLAKSFGVSRSPIREAMRLLEKDGLLKADDRN 7 0 


Query: 


79 


GIIVKGVSIKDACEIFEIRKALETIATVQAMHLMTEEDFKVMHNLLEDCETFI - -AEDDT 13 6 






G + ++ KD EI++IR LE LA + EE+ ++ LE+ E I +DT 


Sb j ct : 


71 


GFSITSLTAKDVDEIYKIRIPLEQLAVEIiVIDEADEEELTILEKQLEETEKAIHNGTEDT 130 


Query: 


137 


NRILDNFNAFNNLIYSYSQMVRLKEIVTELQAYLVYFRKISISSVERRKRALSEHWMIYR 196 






IN F+ L+ +S LK ++ + + + R ++ + R + L EH 1+ 


Sb j ct : 


131 


EIIRLN-QKFHELLVDFSHNRHLKNLLEHVNDLIHFCRILNYTGDHRAETILREHRRIFE 189 


Query: 


197 


GMKNKDHEQITLITHEHLNSSLEFI 221 






+K K+ E H N E + 


Sbjct: 


190 


EVKKKNKEAAKQHVLAHFNHDCEHL 214 


Based on this 


analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 



antigens for vaccines or diagnostics. 



Example 2718 

A DNA sequence (GASxll73R) was identified in S.pyogenes <SEQ ID 7935> which encodes the amino 
acid sequence <SEQ ID 7936>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have a cleavable N-term signal seq. 



INTEGRAL 


Likelihood 




■10. 


.99 


Transmembrane 


450 


- 466 


( 


445 


- 473) 


INTEGRAL 


Likelihood 




-9. 


.61 


Transmembrane 


33 


- 49 


( 


30 


- 55) 


INTEGRAL 


Likelihood 




-8.55 


Transmembrane 


326 


- 342 


( 


321 


- 346) 


INTEGRAL 


Likelihood 




-7 


,01 


Transmembrane 


288 


- 304 


( 


286 


- 311) 


INTEGRAL 


Likelihood 




-6 


.79 


Transmembrane 


95 


- Ill 


( 


88 


- 114) 


INTEGRAL 


Likelihood 




-4 


.99 


Transmembrane 


265 


- 281 


( 


264 


- 285) 


INTEGRAL 


Likelihood 




-4 


.62 


Transmembrane 


208 


- 224 


( 


204 


- 228) 


INTEGRAL 


Likelihood 




-3 . 


.13 


Transmembrane 


126 


- 142 


( 


126 


- 145) 


INTEGRAL 


Likelihood 




-2 


.81 


Transmembrane 


366 


- 382 


( 


365 


- 383) 


INTEGRAL 


Likelihood 




-2 


.34 


Transmembrane 


419 


- 435 


( 


417 


- 438) 



Final Results 

bacterial membrane Certainty=0. 53 94 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9169> which encodes the amino acid sequence 
<SEQ ID 9170>. Analysis of this protein sequence reveals the following: 

Possible cleavage site: 39 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-10.99 Transmembrane 443 - 459 ( 438 - 466) 
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INTEGRAL 


Likelihood 




-8 . 


.55 


Transmembrane 


319 


- 335 


( 


314 


- 339) 


INTEGRAL 


Likelihood 


_ 


-7. 


.01 


Transmembrane 


281 


- 297 


( 


279 


- 304) 


INTEGRAL 


Likelihood 




-6. 


.79 


Transmembrane 


88 


- 104 


( 


81 


- 107) 


INTEGRAL 


Likelihood 




-4 


.99 


Transmembrane 


258 


- 274 


( 


257 


- 278) 


INTEGRAL 


Likelihood 




-4. 


.62 


Transmembrane 


201 


- 217 


( 


197 


- 221) 


INTEGRAL 


Likelihood 




-3. 


.13 


Transmembrane 


119 


- 135 


( 


119 


- 138) 


INTEGRAL 


Likelihood 




-2. 


.81 


Transmembrane 


359 


- 375 


( 


358 


- 376) 


INTEGRAL 


Likelihood 




-2 


.34 


Transmembrane 


412 


- 428 


( 


410 


- 431) 



Final Results 

bacterial membrane Certainty=0. 539 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG08853 GB:AE004959 probable citrate transporter [Pseudomonas aeruginosa] 
Identities = 199/468 (42%) , Positives = 296/468 (62%) , Gaps = 41/468 (8%) 



Query: 


9 


LLTMLAYAMIIVFMYVVMKKKMTPFTALVMIPLIMTIAVILTGSADFNADAKFVAFVGDG 


68 






+LT+LA+AM+ FM+++M K+++ AL+++P +AF G 




Sbjct: 


1 


MLTLLAFAMVATFMFL IMTKRLSAL I ALI LVP IAFALIG 


39 


Query: 


69 


GIAKDLTAIGP^WMYGINNTAKTGIMLLFAILFFSVMLDAGLFDPITEKMIRFAKGDPMK 


128 






GAL GPM++ GI A TG+ML+FAIL+F++M+D+GLFDP K++R KGDP+K 




Sb j ct : 


40 


GFAAGL- - -GPMMLDGIRTLAPTGVMLMFAILYFAIMIDSGLFDPAVRKILRLVKGDPLK 


96 


Query: 


129 


VLIATAVVAAAVSLNGDGTTTTLICCSAFLPIYKKLDMKIMNLGVLIILQNTIMNLLPWG 


188 






V + TA +A VSL+GDG+TT +IC +A LP+Y +L M + + LI+L + ++N+ PWG 




Sb j ct : 


97 


VSLGTAAIAMIVSLDGDGSTTYMICTAAVLPLYSRLGMSPLVMACLIMLSSGVIJsIMTPWG 


156 


Query: 


189 


GPTARAMSVLGVGP - EILGYLAPGMILSLL- -YVICWVAPSMGRKERARLGVIDL- -SEE 


243 






GPTARA S L V P +1 + P MI LL + I W+ G++ERARLG + L E 




Sb j ct : 


157 


GPTARAASALHVDPADIFVPMIPAMIAGLLAIFAIAWI YGKRERARLGELHLPTDHE 


213 


Query: 


244 


DMRQLTDITDPDTLFIRRPKNFVFNAILTIGLITWLVAGSFNKSIAMAPLLLFAVGTCIA 303 






D+ +++ P+ RRPK FNAILT+ L+ L+AG + M L + A G IA 




Sb j ct : 


214 


DLAEISVSQYPEA- - -RRPKLLWFNAILTWLMATLIAGL LPMPVLFMIAFG- - IA 


264 


Query. 


304 


LMVNYPVLKDQSKRIGDNAGDAVQWILVFAAGIFMGLFQGSGMASALAQSFATIIPKQL 363 






++VNYP +++Q KRIG +A + + W L+FAAG+F G+ G+GM A+++S +IP L 




Sb j ct : 


265 


MIvNYPCIQEQKKRIGAHAENILAWSLIFAAGVFTGILSGTGMVDAMSKSLLAVIPPAL 324 


Query: 


364 


AGFWGLVIALVSAPGTFFISNDGFYYGILPVLAFAGAEYGFSNMAMALASLMGQAFHLLS 


423 






+ + ALVS P TFF+SND FYYG+LP+L +A AEYG + + MA AS++GQ HLLS 




Sb j ct : 


325 


GPYLATITALVSMPFTFFMSNDAFYYGVLPILTQAAAEYGITPVEMARASIVGQPVHLLS 


384 


Query: 


424 


PLVAFIYLLLRLTGLDMGEWQKEAAKYALIIFVIFWTIIAMGQMPLY 471 








PLV YLL+ L +D G+ Q+ K+A+++ + + + +G PL+ 




Sb j ct : 


385 


PLVPSTYLLVGLAKIDFGDHQRFTLKWAVLVCLAILAMALLLGLFPLF 432 





Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2719 

A DNA sequence (GASxl 174) was identified in S.pyogenes <SEQ ID 793 7> which encodes the amino acid 
sequence <SEQ ID 793 8>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 
Final Results 
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bacterial cytoplasm Certainty=0. 3948 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2720 

10 A DNA sequence (GASxl 175) was identified in S.pyogenes <SEQ ID 7939> which encodes the amino acid 
sequence <SEQ ID 7940>. Analysis of this protein sequence reveals the following: 

Possible site: 39 



15 



20 



>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3519 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

25 Example 2721 

A DNA sequence (GASxl 177) was identified in S.pyogenes <SEQ ID 794 1> which encodes the amino acid 
sequence <SEQ ID 7942>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
30 »> Seems to have an uncleavable N-term signal seq 



35 



Final Results 

bacterial membrane Certainty=0 .4694 (Affirmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

45 >GP:AAB89172 GB:AE000960 oxaloacetate decarboxylase, sodium ion pump 

subunit (oadB) [Archaeoglobus fulgidus] 
Identities = 190/354 (53%) , Positives = 255/354 (71%) , Gaps = 8/354 (2%) 

Query: 16 IVMWIGALLMYLGIKKEYEPTLLVPMGLGTILVNFPGSGvIjTQvWGvEQEGVFEALFN 75 
50 +VM+ +G LL+YLGI K+ EP LLVP+G+G ILVN PG G+ E+ +F+ 

Sbjct: 5 LVMIGVGLLLVYLGIVKKMEPLLLVPIGIGAILvNIPGGGL AEEGS I FDLFLK 57 



INTEGRAL 


Likelihood 




-9. 


.24 


Transmembrane 


115 - 


131 


( 


105 


- 137) 


INTEGRAL 


Likelihood 




-8. 


.92 


Transmembrane 


208 - 


224 


( 


204 


- 238) 


INTEGRAL 


Likelihood 




-7. 


.80 


Transmembrane 


282 - 


298 


( 


273 


- 303) 


INTEGRAL 


Likelihood 




-4. 


,94 


Transmembrane 


85 - 


101 


( 


75 


- 102) 


INTEGRAL 


Likelihood 




-4, 


.04 


Transmembrane 


10 - 


26 


( 


3 


- 32) 


INTEGRAL 


Likelihood 




-3. 


.61 


Transmembrane 


255 - 


271 


( 


253 


- 271) 
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Query: 


76 


FGIGTELFPLLIFIGIGAMIDFGPLLQNPFMLLFGDAAQFGIFFWWAVLAGFDIKKAA 


135 






+ I TE+ PLLIF+G+GA+ DF PLL NP L G AAQ GIF ++ A+ GF +EAA 




Sb j ct : 


58 


YLIHTEIVPLLIFLGLGALTDFSPLLANPKTFLLGAAAQIGIFAALIAALFLGFTPQEAA 


117 


Query: 


136 


SIGIIGAADGPTSIFVANQIAKDLLGPITVAAYSYMALVPIIQPFAIKLWTKKERRIRM 


195 






SIGIIG ADGPT+I+ LA LL V7AAYSYM+LVPI I QP IK +T+ +ER+I+M 




Sb j ct : 


118 


SIGIIGGADGPTTIYTTTIIAPHLLAATAVAAYSYMSLVPIIQPPIIKALTSSRERKIKM 


177 


Query: 


196 


TYKAENVSQMTKILFPI I ITLVAGFIAPISLPLVGFLMFGNLLRECGVLDRLSQTAQNEL 


255 






+ VS+ KILFPI +++GF+AP +LPLVG LM GNL RE GV DRL++ A EL 




Sb j ct : 


178 


R-QLRIVSKKEKILFPIATIIISGFLAPKALPLVGMLMTGNLFRESGVTDRLAKGASEEL 


236 


Query: 


256 


VNI Ifc> ILLGLTIS IKMQADLFLNVQTLIiI IVFGLIiA.FIMDb lGGWli* AKFLNLt KKEK1JN 


315 






+NI++I+LGL++ M+A+ FL +TLL++ G++AF + GGV+ AK +NLF KEKIN 




Sbjct: 


237 


miMTIILGLSVGSTMRAESFLTQKTLLVLALGWAFAAATAGGVLLAKVMNLFLKEKIN 


296 


Query: 


316 


PMIGAAGISAFPMSSRVIQKMATDEDPQNFILMyAVGANVSGQIASVIAGRTiTiTi 369 








PMIGAAG+SA PMS+RV+Q++A +EDP N ILM+A+G NV+G I S +A G+L+ 




Sb j ct : 


297 


PMIGAAGVSAVPMSARWQRLAIEEDPHNHILMHAMGPNVAGVIGSAVAAGVLI 350 




Based on this 


analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 



antigens for vaccines or diagnostics. 



Example 2722 

25 A DNA sequence (GASxl 178) was identified in S.pyogenes <SEQ ID 7943> which encodes the amino acid 
sequence <SEQ ID 7944>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have no N-terminal signal sequence 
30 INTEGRAL Likelihood = -9.50 Transmembrane 21 - 37 ( 8 - 43) 

Final Results 

bacterial membrane Certainty=0 .4800 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
40 antigens for vaccines or diagnostics. 

Example 2723 

A DNA sequence (GASxl 179) was identified in S.pyogenes <SEQ ID 7945> which encodes the amino acid 
sequence <SEQ ID 7946>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

45 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 19 06 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 
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>GP:AAF93961 GB:AE004165 citrate lyase, gamma subunit [Vibrio cholerae] 
Identities = 46/97 (47%) , Positives = 64/97 (65%) 

Query: 1 MDIKQTAVAGSLESSDLMITVSPNDEQTITITLDSSVEKQFGNHIRQLIHQTLVNLKVTA 60 
5 MIA AG+LESSDL + + PN++ I + LDS+VE+QFG+ IRQ++ TL ++V 

Sbjct: 1 MKIAHPAFAGTLESSDLQTOIEPNITOGGIELvIjDSTvEQQFGHAIRQvA?LHTLDAMQvRD 60 

Query: 61 AKVEAVDKGALDCTIQARTIAAVHRAAGIDQYDWKEI 97 
A V DKGALDC I+AR AAV RA + +W ++ 
10 Sbjct: 61 ALVTIEDKGALDCVIRARVQAAVMRACDVQNIEWSQL 97 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2724 

15 A DNA sequence (GASxl 181) was identified in S.pyogenes <SEQ ID 7947> which encodes the amino acid 
sequence <SEQ ID 7948>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>» Seems to have no N-terminal signal sequence 
20 INTEGRAL Likelihood = -1.65 Transmembrane 74 - 90 ( 74 - 90) 

Final Results 

bacterial membrane Certainty=0 . 1659 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA71632 GB:Y10621 CILB, citryl-CoA lyase beta subunit 
30 [Leuconostoc mesenteroides] 

Identities = 187/293 (63%), Positives = 237/293 (80%), Gaps = 1/293 (0%) 

Query: 2 ERLRRTMMFVPGANAAMLRDAPLFGADSIMFDLEDSVSLKEKDTSRALVHFALKTFDYSS 61 
ERLRRTMMFVPG N AM++DA +FGADSIMFDLED+VSL EKD++R LV+ AL+T DY S 
35 Sbjct: 4 ERLRRT^MFVPGNNPA^TVKDAGIFGADSIMFDLEDAVSIAEKDSARYLVYEALQTVDYGS 63 

Query: 62 VETWRVNGLDS - CGALDIEAWLAGVNVIRLPKTETAQDI IDVEAVIERVERENSIEVG 120 

E WR+NGLD+ DI+A+V AG++VIRLPK ETA + ++E++I E+E VG 

Sbjct: 64 SELWRINGLDTPFYKNDIKAMVKAGIDVIRLPKVETAAMMHELESLITDAEKEFGRPVG 123 

40 

Query: 121 RTRMMAAIESAEGVmAREIAKASKRLIGIALGAEDYVTNMKTRRYPDGQELFFARSMIL 180 

T MMAAIESA GV+NA EIA AS R+IGIAL AEDY T+MKT RYPDGQEL +AR++IL 
Sbjct: 124 TTHMMAAIESALGWNAVEIANASDRMIGIALSAEDYTTDMKTHRYPDGQELLYARNVIL 183 

45 Query: 181 HAARAAGIAAIDTVYSDVNNTEGFQNEVRMIKQLGFDGKSVINPRQIPLVNEIYTPTKKE 240 

HAARAAGIAA DTV++++N+ EGF E ++I QLGFDGKS+INPRQI +VN++Y PT+KE 
Sbjct: 184 HAARAAGIAAFDWFTNLNDEEGFYRETQLIHQLGFDGKSLINPRQIEMVNKVYAPTEKE 243 

Query: 241 IDHAKQVIWAIREAESKGSGVISIjNGiaWDKPIVERAERVIALATAAGVLSEE 293 
50 I++A+ VI AI EA+ KGSGVIS+NG+MVD+P+V RA+RV+ LA A ++ E 

Sbjct: 244 INNAQNVIAAIEEAKQKGSGVIS^GQWTORPVVLRAQRVMKIANANHLVDSE 296 , 



Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2725 

A DNA sequence (GASxll82) was identified in S.pyogenes <SEQ ID 7949> which encodes the amino acid 
sequence <SEQ ID 7950>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

5 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3554 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:CAA71633 GB:Y10621 CILA, citrate CoA-transf erase alpha subunit 

[Leuconostoc mesenteroides] 
Identities = 294/511 (57%) , Positives = 378/511 (73%) , Gaps = 7/511 (1%) 

Query: 4 NKLGRDIPQPYADQY--GVFEGELANIKQYDESSRRIKPVKPGDSKLLGSVREAIEKTGL 61 
20 NK+ D+P +Q VFE + +++ G+SK+ S+ + + T L 

Sbjct: 3 NKVNIDVPDAILEQLDDSVFESTNYGNPEIQRVGPKVRATT-GESKVQSSIDDVLSNT-L 60 

Query: 62 TDGMTISFHHHFREGDFIMNIWLEEIAKMGIKNLSIAPSSIANV-HEPLIDHIKNGVVTN 120 
DGMTISFHHHFREGDF+ N V+ +1 MG +NL++APSS+ NV ++ +1+ IK GWTN 
25 Sbjct: 61 KEGOTISFHHHFREGDFVTOKVMRKIIDMGYQI^TIA^ 120 

Query: 121 ITSSGLRDKVGAAISEGLMENPWIRSHGGRARAIASGDIHIDVAFLGAPSSDAYGNVNG 180 

ITSSG+R +G A+S G+++NPV+ RSHG RARAI SG+I IDVAFLG P+SD GN NG 
Sbjct: 121 ITSSGMRGTLGDAVSHGILKNPVIFRSHGARARAIESGE1KIDVAFLGVPNSDEMGNANG 180 

30 

Query: 181 TKGKATCGSLGYAMIDAKYADQ WI LTDNLVPYPNTPI S I PQTD VDYWTVDAIGDPQGI 240 

G A GSLGYA+IDA+YAD++V++TD ++PYPNTP SI QT VDYW VD +GDP I 
Sbjct: 181 MNGDAAFGSLGYALIDAQYADKLVLITDTIMPYPNTPASIKQTQVDYWKVDKVGDPDKI 240 

35 Query: 241 AKGATRFTKNPKELLIAEYAAKVITNSPYFKEGFSFQTGTGGASLAVTRFMREAMIKENI 300 

GATRFTK+PKEL IA+ VI NS YFK FSFQTG+GGA+LAVTRF+REAM+ +NI 
Sbjct: 241 GSGATRFTKDPKELKIAKTVNDVITOSKYFKNDFSFQTGSGGAAl^VTRFLREAMMAQNI 300 

Query: 301 KASFALGGITOAMWLLEEELVEKILDVQDFDHPSAVSLGKHAEHYEIDANMYASPLSKG 360 
40 ASFALGGIT V+LL E LV +++DVQDFD +A S+ EIDA+ YA P +KG 

Sbjct: 301 MAS FALGGI TKPTVDLLNEGLVNRVMDVQDFDKGAAS SMKLS PNQQE IDASWYADPANKG 360 

Query: 361 RVINQLDTCILSALEVDTNFNVNVMTGSDGVIRGASGGHCDTAFAAKMSLVISPLIRGRI 420 
A++++LD ILSALEVDTNFNVNVM+GSDGVIRGA GGH D A AK++++ PL+RGRI 
45 Sbjct: 361 AMVDKLDVAILSALEVDTNFNVNVMSGSDGVIRGAIGGHQDAA-TAKLTIISVPLVRGRI 419 

Query: 421 PTFVDEVNTVITPGTSVDVIVTEVGIAINPNRQDLVDHFKSL-NVPQFSIEELKEKAYAI 479 

T V +VNTVITPG S+DV+VTEVGIAINP R DLV+ K + +P +SIEEL++KA I 
Sbjct: 420 ATIVPKvNWITPGDSIDVVvTEVGIAINPKRTDLVEQLiKQVPGLPIYSIEELQQKAEKI 479 



50 



Query: 480 VGTPERIQYGDKWALIEYRDGSLMDWYNV 510 

VG P +++ D+WA+ EYRDGS++D++ V 
Sbjct: 480 VGQPAPLKFTDRWAVAEYRDGSVIDIIKEV 510 



55 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2726 

A DNA sequence (GASxl 183) was identified in S.pyogenes <SEQ ID 795 1> which encodes the amino acid 
sequence <SEQ ID 7952>. Analysis of this protein sequence reveals the following: 



WO 02/34771 



PCT/GB01/04789 



-2780- 

Possible site: 13 

>>> Seems to have a cleavable N-term signal seq. 



5 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA71634 GB:Y10621 CILG, hypothetical protein [Leuconostoc 
mesenteroides] 

Identities = 65/176 (36%) , Positives = 97/176 (54%) , Gaps = 3/176 (1%) 

15 

Query: 21 DTYFSGEAIQLSDMLRAREERALRQLHLLKEYPEGSLLSVTMNIPGPIKTSPKLLEAFDI 80 

D + GE + L +L RE R Q L+ +P + SV +N+PGPIKTSPKL F I 
Sbjct: 2 DYFEGGERLNLMQVLDNREWREKYQKQLmSFPTAVITSVKLNLPGPIKTSPKLQSVFQI 61 

20 Query: 81 VIKAIQTALADDKICYQLRLL-PTTGYEYYLITSLPSRDLKLKMIALETELPIGRLMDLD 139 

+1 + D +1 + + TG + + +TS + +K MI E +GRL+DLD 

Sbjct: 62 IINDLNPVFKDLQIIKEASFVDQITGPDIFFVTSGCLKLVKQIMITFEESHLLGRLLDLD 121 

Query: 140 VLVLQNDLPHSISRTVLGGSPRQCFICSKEAKVCGRLRKHSVEEMQTAISKLLHSF 195 
25 V+ D +SR LG +PR+C +C K+AK C + HS+ E + I+K+LH+F 

Sbjct: 122 VMCQNAD- -KQLSREELGFAPRKCLLCGKDAKTCIKEGNHSLAEGYSQINKMLHNF 175 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

30 Example 2727 

A DNA sequence (GASxl 184) was identified in S.pyogenes <SEQ ID 795 3> which encodes the amino acid 
sequence <SEQ ID 7954>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

35 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3730 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB99233 GB:U67563 oxaloacetate decarboxylase alpha chain (oadA) 
45 [Methanococcus jannaschii] 

Identities = 245/441 (55%) , Positives = 336/441 (75%) , Gaps = 5/441 (1%) 

Query: 10 IRITETvLRDGQQSQIATRMTTKEMIPIIjETLDNAGYHALEMWGGATFDSCLRFLNEDPW 69 
++I +T RD QQS IATRM T++M+PI E +D G++++E+WGGATFD+C+R+LNEDPW 
50 Sbjct: 2 VKIVDTTFRDAQQSLIATRMRTEDMLPIAEKMDEVGFYSMEVWGGATFDACIRYLNEDPW 61 

Query: 70 ERLRAIRKAVKKTKLQMLLRGQNLIX3YROTADDVTOSFIQKSIENGIDIWIFDAIjNDPR 129 

ERLRA++K ++ T LQMLLRGQNL+GYR+Y DD+V F+ K+ ENGIDI RIFDALND R 
Sbjct: 62 ERLRALKKRIQNTPLQMLLRGQNLVGYRHYPDDIVEKFVIKAHENGIDIFRIFDALNDVR 121 



55 



Query: 130 NLQTAVSATKKFGGHAQ VAI SYTTSPVHTIDYFVELAKAYQAIGADS I CI KDMAGVLTPE 189 

N++TA+ KK G Q AI YT SPVHTID +VELAK + +G DS I C I KDMAG+LTP 
Sbjct: 122 NMETAIKTAKKVGAEVQGAICYTISPVHTIDQYVELAKKLEEMGCDSICIKDMAGLLTPY 181 
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Query: 


190 


IGYQLVKCIKEOTTIPLEVHTHATSGISEMTYLKVAEAGADIIDTAISSFSGGTSQPATE 


249 






GY+LVK +KE ++P++VH+H TSG++ MTYLKV EAGAD++D AIS F+ GTSQP TE 




Sbjct: 


182 


EGYELVKRLKEE I SLPIDVHSHCTSGLAPMTYLKVIEAGADMVDCAI S PFAMGTSQPPTE 


241 


Query: 


250 


SMAIALTDLGFDTGLDMQEVAKVAEYFNT I RDHYRE I GI LNPKVKDTEPKTL I YQVPGGM 


309 






S+ +AL +DTGLD++ + ++ +YF +R+ Y+ + +P + + + L+ YQVPGGM 




Sb j ct : 


242 


S I VVALKGTKYDTGLDLKLLNE I RDYFMKVREKYKM- - LFS P I SQI VDARVL VYQVPGGM 


299 


Query: 


310 


LSNLLSQLTEQGLTDKYEEVLAEVPKVRflDLGYPPLVTPLSQMVGTQALMNI I SGERYKV 


369 






LSNL+SQL EQG DK+EEVL E+P+VR DLGYPPLVTP SQ+VGTQA++N+++ ERYK+ 




Sbjct: 


300 


LSHbVSQLKEQGALDKFEEVLQEIPRVRKDLGYPPLVTPTSQIVGTQAVLNVLTEERYKI 


359 


Query: 


370 


VPNEIKDYVRGLYGQSPAPLAEGIKEKIIGD-EAVITCRPADLIEPQMIYLRDEIAP--Y 


426 






+ NE+ +YV+G YG+ PAP+ + ++++ + E ITCRPADL+ P+ ++ E 




Sbjct: 


360 


ITNEVVNYWGFYGKPPAPINPELLKRVLDEGEKPITCRPADLLPPEWEKVKKEAEEKGI 


419 


Query: 


427 


AHSEEDVLSYASFPQQARDFL 447 








EED+L+YA +PQ A FL 




Sb j ct : 


420 


VKKEED I LTYALYPQI AVKFL 440 





Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2728 

25 A DNA sequence (GASxll85R) was identified in S.pyogenes <SEQ ID 7955> which encodes the amino 
acid sequence <SEQ ID 7956>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

>>> Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0 . 2497 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF93960 GB:AE004165 citrate (pro-3S) -lyase ligase [Vibrio cholerae] 
Identities = 118/336 (35%) , Positives = 183/336 (54%) , Gaps = 5/336 (1%) 

40 



Query: 


4 


YTISKVFPSDKTTMASVKNLLHQEGIRLDAHLDYTCAIMNAQNDVIATGSYFGNSLRCLC 


63 






YT S+V ++T + +K L Q + +D +++ + N +IA G G+ L+ + 




Sb j ct : 


10 


YTFSRVSTKNRTKLLQIKEFLCQHQLTVDDDVEHF-WAYGTNQIIACGGIAGHVLKSIA 


68 


Query: 


64 


VSSAYCGEGLLNRIVSHLIDEEYALGNYHLFVYTKTSSAAFFKDLGFTEIVHIDNHISFL 


123 






VS A QG G ++++ L + Y +G + LF++TK ++ F+ GF + ++ HI+ L 




Sb j ct : 


69 


VSPALCGTGFALKLMTELTNFAYEMGRFSLFLFTKPANIDLFRQCGFFLVDKVEPHIALL 


128 


Query: 


124 


ENKKTGFQDYLMTLNKPEQTPGKVAAIVINANPFTLGHQFLVEKAARENDWVHLFMVSED 


183 






EN Y L + + K+ +IV+NANPFTLGHQ+L+E+A + DWVHLF+V + 




Sb j ct : 


129 


ENSPNRLSOTCKQLQLLKMSGRKIGSIVMNANPFTLGHQYLIEQACEQCDWVHLFWKAE 


188 


Query: 


184 


RSLIPFSVRKRLIQEGLAHLDNVIYHETGPYLISQATFPAYFQKEDNDVIKSQALLDTAI 


243 






++ R +1+ G HL N+ H Y+IS+ATFP+YF K+ V +S LD +1 




Sb j ct : 


189 


NKDFSYADRMAMIKAGSKHLLNLTIHSGSDYIISRATFPSYFIKDQQVVNQSHTALDLSI 


248 


Query: 


244 


FL-KIAQTLQITKRYVGEEPTSRVTAIYNEIM AEQLQQAGILLDILPRKAINQQQDP 


299 






F IA LIT R+VG EP VT YN+ M E+ A + ++ + Q p 




Sb j ct : 


249 


FRHSIAPALGITHRWGSEPICTVTRHYNQAMRRWLEEAHDASAPIQVVEIERSQQASQP 


308 


Query: 


300 


ISASTARQALKDNDWDLLAKLLPKTSLDYFCSLKAQ 335 
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ISAS R LK + +A L+PKT+ Y C A+ 
Sbjct: 309 ISASRVRYLLKQFGFAAIADLVPKTTYSYLCQHYAE 344 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
5 antigens for vaccines or diagnostics. 

Example 2729 

A DNA sequence (GASxl 187) was identified in S.pyogenes <SEQ ID 7957> which encodes the amino acid 
sequence <SEQ ID 7958>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

10 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4790 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2730 

A DNA sequence (GASxl 188R) was identified in S.pyogenes <SEQ ID 7959> which encodes the amino 
acid sequence <SEQ ID 7960>. Analysis of this protein sequence reveals the following: 
25 Possible site: 21 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 .3956 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

35 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2731 

A DNA sequence (GASxl 190) was identified in S.pyogenes <SEQ ID 796 1> which encodes the amino acid 
40 sequence <SEQ ID 7962>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0 . 1274 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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10 



No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2732 

A DNA sequence (GASxll96R) was identified in S.pyogenes <SEQ ID 7963> which encodes the amino 
acid sequence <SEQ ID 7964>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have a cleavable N-term signal seq. 



Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2733 

A DNA sequence (GASxl21 1) was identified in S.pyogenes <SEQ ID 7965> which encodes the amino acid 
sequence <SEQ ID 7966>. Analysis of this protein sequence reveals the following: 

25 Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 . 1850 (Affirmative) < succ> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

35 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2734 

A DNA sequence (GASxl219R) was identified in S.pyogenes <SEQ ID 7967> which encodes the amino 
40 acid sequence <SEQ ID 7968>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 2284 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

5 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2735 

A DNA sequence (GASxl225) was identified in S.pyogenes <SEQ ID 7969> which encodes the amino acid 
10 sequence <SEQ ID 7970>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>» Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 .2062 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

20 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2736 

25 A DNA sequence (GASxl229) was identified in S.pyogenes <SEQ ID 7971> which encodes the amino acid 
sequence <SEQ ID 7972>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0 . 2755 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

35 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

40 Example 2737 

A DNA sequence (GASxl247R) was identified in S.pyogenes <SEQ ID 7973> which encodes the amino 
acid sequence <SEQ ID 7974>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

45 »> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -6.32 Transmembrane 55 - 71 { 53 - 81) 
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INTKGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 



■6.00 
•2.18 
■1.54 



Transmembrane 
Transmembrane 
Transmembrane 



74 - 90 ( 72 - 95) 
95 - 111 ( 95 - 111) 
124 - 140 ( 123 - 141) 



Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0. 3527 (Affirmative) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14326 GB:Z99116 yqjA [Bacillus subtilis] 
Identities = 97/306 (31%) , Positives = 154/306 (49%) 



Query: 


6 


RTLKMTLATIVAILIAYQLHLDYAMSAGIIALLSvLDTRKSSLWARNRLLSFFLAFGIA 65 






RT+K L T +AI 1+ LHL SAGII +L + T+K SL + R + LA + 




Sb j ct : 


7 


RTIKTALGTALAIYISQLLHLQNFASAGIITILCIQITQKRSLQASWARFWACCIAIAFS 


66 


Query: 


66 


MMCFSLFGFTTVGFMCYLLIIIPLLYHFQIEAGLVPITVLVTHLIAKKSIALPILSNEFM 


125 






+ F L G+ LLI IP+ +1 G+V +V++ HL I + NE 




Sb j ct : 


67 


YLFFELIGYHPFVIGALLLIFIPITVLLKINEGIVTSSVIILHLYMSGGITPTFIWNEVQ 126 


Query: 


126 


LFFVGTSVALLFNAYMGPQDQQIRYYHQKVESDLKGILYRFESFLLEGKGQNEGLLIKNL 


185 






L VG VALL N YM D+++ Y +K+E + I E +LL G+ G I 




Sb j ct : 


127 


LITVGIGVALLMNLYMPSLDRKLIAYRKKIEDNFAVIFAEIERYLLTGEQDWSGKEIPET 


186 


Query: 


186 


DKILDEALKLvYRERHNQLFQQTNYQVHYFEMRRQQNRLLGQmiNVNTLMRQSKESILL 


245 






+++ EA L YR+ N + + N HYF+MR +Q ++ ++ V ++ + ++ 




Sb j ct : 


187 


HQLITEAKNIAYRDVQNHILRYENLHYHYFKMREKQFEIIERLLPKVTSISITVDQGKMI 


246 


Query: 


246 


SHLFHETACQLSEQNPALTLIDDIEQLLETFRHGDLPQTREEFERRAVLFQLLQDLERFI 


305 






+ H+ + NA ++++F LP TREEFE RA LF LL ++E+++ 




Sb j ct : 


247 


AEFIHDLREAIHPGNTAYKFLKRLADMRKEFEEMPLPATREEFEARAALFHLLGEMEQYL 306 


Query: 


306 


LLKVEF 311 








++K F 




Sbjct: 


307 


VIKSYF 312 





Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2738 

A DNA sequence (GASxl261) was identified in S.pyogenes <SEQ ID 7975> which encodes the amino acid 
sequence <SEQ ID 7976>. Analysis of this protein sequence reveals the following: 

Possible site: 15 



»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty^O . 6082 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 



Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2739 

A DNA sequence (GASxl262R) was identified in S.pyogenes <SEQ ID 7977> which encodes the amino 
acid sequence <SEQ ID 7978>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

5 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.06 Transmembrane 38 - 54 ( 37 - 55) 

Final Results 

10 bacterial membrane Certainty=0. 3 8 24 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

15 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2740 

A DNA sequence (GASxl265R) was identified in S.pyogenes <SEQ ID 7979> which encodes the amino 
20 acid sequence <SEQ ID 7980>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have a cleavable N-term signal seg. 

25 Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

30 No corresponding DNA sequence was identified in S.agalactiae. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2741 

A DNA sequence (GASxl270) was identified in S.pyogenes <SEQ ID 798 1> which encodes the amino acid 
35 sequence <SEQ ID 7982>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>>> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0. 4063 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2742 

A DNA sequence (GASxl290R) was identified in S.pyogenes <SEQ ID 7983> which encodes the amino 
acid sequence <SEQ ID 7984>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

5 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.37 Transmembrane 180 - 196 ( 172 - 207) 
INTEGRAL Likelihood =-10.19 Transmembrane 34 - 50 ( 30 - 53) 
INTEGRAL Likelihood = -4.09 Transmembrane 233 - 249 ( 232 - 250) 

10 

Pinal Results 

bacterial membrane Certainty=0 . 5946 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB88010 GB-.L21856 MalA [Streptococcus pneumoniae] 
Identities = 66/237 (27%) , Positives = 105/237 (43%) , Gaps = 28/237 (11%) 

20 



Query: 


45 


MI P VTLHYANMTTYPLERI VTKSLSP I TDKTYQALTQGKI EKD - - -TFQGQSLIRRD- - - 98 






M+P+ + ++ TYPLE + P+TDK Q L++ D T+ G + 


Sb j ct : 


1 


MVPIAIQNSSQETYPLETFIDNVYEPLTDKWQDLSEHATI VDGTLTYTGTASQAPS WI 6 0 


Query: 


99 


GELVLAVLPTKVDLEQLASESTRQIIVTKKEWRFVTPDGKEL-RAHVRGQQQSLADLTW 157 






G + LP + L T +++++K + KEL R R Q T 


Sb j Ct : 


61 


GPSQIKELPKDLQLHF DTNELVISK ESKELTRI SYRAI Q TEG 102 


Query: 


158 


KAVKDFVNQQWY- - -DSNKASVLQFLLLTFVLMVCVGTLIVIGLGAFFLTLTKRSRLFMI 214 






KD + Q + +N+ + FL+L + + IV L +TK+SRLF 


Sb j ct : 


103 


FKSKDSLTQAFIRLVPTNRVYISLFLVLGASFLFGLNFFIVSLGACLLLYITKKSRLFSF 162 


Query: 


215 


RNFSEGLGLMVNCLAWPSLLAIALSFFIQDPVLIMNCQVFGTLLMLTWVFYKTQFRD 271 






R F E ++NCL P+L+ + L F Q+ ++ Q +L L +FYKT FRD 


Sb j ct : 


163 


RTFKECYHFILNCLGLPTLITLILGLFGQNMTTLITVQNILFVLYLVTIFYKTHFRD 219 


Based on 


this 


analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 



antigens for vaccines or diagnostics. 



Example 2743 

40 A DNA sequence (GASxl294) was identified in S.pyogenes <SEQ ID 7985> which encodes the amino acid 
sequence <SEQ ID 7986>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>>> Seems to have no N-terminal signal sequence 

45 

Final Results 

bacterial cytoplasm Certainty=0 .2104 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 



WO 02/34771 



-2788- 



PCT/GB01/04789 



Example 2744 

A DNA sequence (GASxl303R) was identified in S.pyogenes <SEQ ID 7987> which encodes the amino 
acid sequence <SEQ ID 7988>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

5 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.07 Transmembrane 13 - 29 ( 8 - 38) 

Final Results 

10 bacterial membrane Certainty=0. 4227 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

15 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2745 

A DNA sequence (GASxl307R) was identified in S.pyogenes <SEQ ID 7989> which encodes the amino 
20 acid sequence <SEQ ID 7990>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>>> Seems to have a cleavable N-term signal seg. 

25 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

30 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2746 

35 A DNA sequence (GASxl312R) was identified in S.pyogenes <SEQ ID 7991> which encodes the amino 
acid sequence <SEQ ID 7992>. Analysis of this protein sequence reveals the following: 

Possible site: 21 



40 



45 



>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1996 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2747 

A DNA sequence (GASxl316R) was identified in S.pyogenes <SEQ ID 7993> which encodes the amino 
5 acid sequence <SEQ ID 7994>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 3504 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 RGD motif: 271-273 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC66321 GB:AE000792 outer surface protein, putative [Borrelia 
burgdorferi] 

20 Identities = 127/365 (34%) , Positives = 195/365 (52%) , Gaps = 14/365 (3%) 

Query: 1 MVDLGFSLYPERYDVTKSKAYIDLCHSYGAKRLFMSLLQLAPADHQMFHCYAELIAYANQ 60 

M ++G S+YP K Y++ +G ++F Shh + + F + EL++ AN+ 
Sbjct: 1 MKEIGISIYPNVSPKNKIIKYLEKSAHFGFTQVFTSLLYI NGNEFDIFKELLSIANK 57 

25 

Query: 61 LGI RVI ADVSPS F I SQAGWSDQLIERA HAFGLAGLRLDEALPLAEIVTLTRNPF 114 

G++ I DVSP + G + G +RLD E +T N 

Sbjct: 58 NGMKPIIDVSPEIFKELGIDLSNLRNCPKLDYFKKLGAWAIRLDNTFTGIEESLMTFNDS 117 

30 Query: 115 GLKIELNMSTDKQLLMSLLATDAERSNIIGCHNFYPHEFTGLSWQHFKDMSRFYHEHDIE 174 

LKI+LN+S + + +++ N++GCHNFYPH++TGLS FK+ ++ + +1 

Sbjct: 118 DLKI QLNI SNINKHI DT IMYFKPNI KNLLGCHNFYPHKYTGLSRNFFKETTKI FKHYS I P 177 

Query: 175 TAAFITAQSASE-GPWLLAEGLPTVEDHRHLPIGLQVELMKAIGTIDNILISNQFISEEE 233 
35 TAAFI++ +A E EG+PT+E HR I Q + + G ID +LISN F SE E 

Sbjct: 178 TAAFISSNNAEECARGKEKEGVPTLESHRSKDIETQAKDLFKEG-IDTVLISNCFPSETE 236 

Query: 234 LAACTQALARPVTT I KVRP 1 1 DLTEVEEQI I -GYPHCYRGDVSDYVIRSTMPRL VYAQES 292 
L ++ + R + +K D VE++II H RGD++ Y IRSTMPR+ Y + 

40 Sbjct: 237 LKKVSK-VNRNILELKADIMPDANSVEKEIILENLHFNRGDINSYRIRSTMPRVYYNNKK 295 

Query: 293 IAPRDQSKEVKRGSIIIDNDRYHRYKGELQIALKNFTVSSKANWAEVREDYLSLLDDLR 352 

P E+K+G I+ID+ Y Y GELQIALK+ + NW ++ D + LL+ + 

Sbjct: 296 F-PVHSPNEIKKGDILIDSSEYLGYTGELQIALKDTPNNGLVNWGKIINDEIYLLEKIE 354 

45 

Query: 353 PWQEF 357 

PW++F 
Sbjct: 355 PWEKF 359 

50 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2748 

A DNA sequence (GASxl319) was identified in S.pyogenes <SEQ ID 7995> which encodes the amino acid 
sequence <SEQ ID 7996>. Analysis of this protein sequence reveals the following: 

55 Possible site: 34 
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>>> Seems to have no N-terrainal signal sequence 



45 



INTEGRAL 


Likelihood = 


-9. 


,50 


Transmembrane 


127 - 


143 


( 


125 - 


151) 


INTEGRAL 


Likelihood = 


-7. 


.43 


Transmembrane 


17 - 


33 


( 


15 - 


36) 


INTEGRAL 


Likelihood = 


-5. 


.68 


Transmembrane 


39 - 


55 


( 


36 - 


57) 


INTEGRAL 


Likelihood = 


-1. 


.86 


Transmembrane 


60 - 


76 


( 


59 - 


77) 


INTEGRAL 


Likelihood = 


-0. 


.59 


Transmembrane 


85 - 


101 


( 


85 - 


101) 



Final Results 

bacterial membrane Certainty=0 .4800 (Affirmative) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2749 

A DNA sequence (GASxl320) was identified in S.pyogenes <SEQ ID 7997> which encodes the amino acid 
sequence <SEQ ID 7998>. Analysis of this protein sequence reveals the following: 

20 Possible site: 45 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.81 Transmembrane 35- 51 ( 35- 51) 

25 Final Results 

bacterial membrane Certainty=0. 1723 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

30 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2750 

35 A DNA sequence (GASxl321) was identified in S.pyogenes <SEQ ID 7999> which encodes the amino acid 
sequence <SEQ ID 8000>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have an uncleavable N-term signal seq 

40 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2751 

A DNA sequence (GASxl329) was identified in S.pyogenes <SEQ ID 8001> which encodes the amino acid 
sequence <SEQ ID .8002>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

5 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.28 Transmembrane 64 - 80 ( 64 - 80) 

Final Results 

10 bacterial membrane Certainty=0 . 1510 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

15 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2752 

A DNA sequence (GASxl332R) was identified in S.pyogenes <SEQ ID 8003> which encodes the amino 
20 acid sequence <SEQ ID 8004>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have an uncleavable N-term signal seq 

25 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

30 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2753 

35 A DNA sequence (GASxl333) was identified in S.pyogenes <SEQ ID 8005> which encodes the amino acid 
sequence <SEQ ID 8006>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have an uncleavable N-term signal seq 

40 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2754 

A DNA sequence (GASxl335R) was identified in S.pyogenes <SEQ ID 8007> which encodes the amino 
5 acid sequence <SEQ ID 8008>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have an uncleavable N-term signal seq 

10 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < succ> 

15 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF96047 GB:AE004354 uridine phosphorylase [Vibrio cholerae] 
Identities = 46/167 (27%) , Positives = 72/167 (42%) , Gaps = 12/167 (7%) 

20 Query: 8 GVKEMISTGTCGVLVP-IAENRFLVPVKALRDEGTSYHWAPSRYIDIDPKMLRLIEKTL 66 

G K ++ G+ G + I ++ A+RDEG S Y+ + +++ +++ L 

Sbjct: 79 GAKAITOVGSAGAMQSEIGLGELILVEGAVRDEGGSKAYIGAAYPAYSSFELVVEMQRFL 138 

Query: 67 LAQ^IAYQEVITWSTDGFYR-ETKEKVAHRQEEGCSVVEMECSALAAVAQLRG IL 120 

25 Q + I S D FY E E + +G +ME SAL V +LRG +L 

Sbjct: 139 AEQSVPIHRGIVRSHDSFYTDEEAELCRYWHRKGILAADMETSALLTVGRLRGLQVASVL 198 

Query: 121 WGQLLFTADTLADVEVY DQRNWGADSFSFALHLCLEVLNTLEKD 164 

+L+ DAVY DQR + + A L LN L+ D 
30 Sbjct: 199 NNWLYEQDVQAGVNQYVNADQRMMQGE- -TLAARAALHALNALKFD 243 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2755 

35 A DNA sequence (GASxl353) was identified in S.pyogenes <SEQ ID 8009> which encodes the amino acid 
sequence <SEQ ID 8010>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>>> Seems to have a cleavable N-term signal seq. 
40 INTEGRAL Likelihood = -5.79 Transmembrane 241 - 257 ( 234 - 260) 

INTEGRAL Likelihood = -5.15 Transmembrane 44 - 60 ( 43 - 65) 
INTEGRAL Likelihood = -4.78 Transmembrane 74 - 90 ( 72 - 92) 

Final Results 

45 bacterial membrane Certainty=0 .3314 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

50 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2756 

A DNA sequence (GASxl354R) was identified in S.pyogenes <SEQ ID 801 1> which encodes the amino 
acid sequence <SEQ ID 8012>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

5 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -3.45 Transmembrane 68 - 84 ( 65 - 86) 

Final Results 

10 bacterial membrane Certainty=0. 2381 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
1 5 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB83831 GB:AL162753 putative integral membrane protein 
[Neisseria meningitidis] 
Identities = 31/72 (43%) , Positives = 46/72 (63%) , Gaps = 6/72 (8%) 

20 Query: 17 FVI YAFDKRKAI KKKRRI SERKLL VI TVLFGGF- GALLAAKKYHHKTRKWYFVI TC 71 

F +Y DKR+A++ KRRI E +LL + LFGG+ GA L ++ + HKT K FV+ T 
Sbjct: 38 FALYGIDKRRAVRGKRRIPEHRLL-LPALFGGWAGAYLGSRIFRHKTAKKRFWLFRLTV 96 

Query: 72 YTS I LLTLLVTY 83 
25 ++L TL++ Y 

Sbjct: 97 SGNVLATLILIY 108 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

30 Example 2757 

A DNA sequence (GASxl363R) was identified in S.pyogenes <SEQ ID 8013> which encodes the amino 
acid sequence <SEQ ID 8014>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

35 >>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
45 antigens for vaccines or diagnostics. 

Example 2758 

A DNA sequence (GASxl367) was identified in S.pyogenes <SEQ ID 8015> which encodes the amino acid 
sequence <SEQ ID 801 6>. Analysis of this protein sequence reveals the following: 

Possible site: 31 



50 



»> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA63508 GB:X92946 hypothetical protein [Lactococcus lactis] 
10 Identities = 64/96 (66%), Positives = 77/96 (79%) 

Query: 1 MPRKTFDKAFKLSAVKLILEEEQPVKMVSSTLEIHPNSLYQWIQEYEKYGESAFPGHGSA 60 

M R+ FDK FK SAVKLILEE VK VS LE+H NSLY+W+QE E+YGESAFPG+G+A 
Sbjct: 1 MARRKFDKQFKNSAVKLILEEGYSVKEVSQELEVHANSLYRWVQEVEEYGESAFPGNGTA 60 

15 

Query: 61 LRHAQFKTKKLEKEHKLLQEELALLKKFQVFLKPNR 96 

L +AQ K K LEKE++ LQEEL LLKKF+VFLK ++ 
Sbjct: 61 LANAQHKIKLLEKENRYLQEELELLKKFRVFLKRSK 96 

20 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2759 

A DNA sequence (GASxl374R) was identified in S.pyogenes <SEQ ID 8017> which encodes the amino 
acid sequence <SEQ ID 8018>. Analysis of this protein sequence reveals the following: 

25 Possible site: 39 

»> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 . 2585 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

35 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2760 

A DNA sequence (GASxl382R) was identified in S.pyogenes <SEQ ID 8019> which encodes the amino 
40 acid sequence <SEQ ID 8020>. Analysis of this protein sequence reveals the following: 

Possible site: 14 



45 



50 



>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.39 Transmembrane 3 - 19 ( 3 - 19) 



Final Results 

bacterial membrane Certainty=0. 1956 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
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The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2761 

5 A DNA sequence (GASxl391R) was identified in S.pyogenes <SEQ ID 8021> which encodes the amino 
acid sequence <SEQ ID 8022>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» May be a lipoprotein 

10 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

20 Example 2762 

A DNA sequence (GASxl404) was identified in S.pyogenes <SEQ ID 8023> which encodes the amino acid 
sequence <SEQ ID 8024>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

25 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3046 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
35 antigens for vaccines or diagnostics. 

Example 2763 

A DNA sequence (GASxl412R) was identified in S.pyogenes <SEQ ID 8025> which encodes the amino 
acid sequence <SEQ ID 8026>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

40 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1590 (Affirmative) < suco 

45 bacterial membrane Certainty=0. 0000 (Not clear) < suco 

bacterial outside Certainty=0 . 0000 (Not clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

5 Example 2764 

A DNA sequence (GASxl414R) was identified in S.pyogenes <SEQ ID 8027> which encodes the amino 
acid sequence <SEQ ID 8028>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

10 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2816 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
20 antigens for vaccines or diagnostics. 

Example 2765 

A DNA sequence (GASxl416) was identified in S.pyogenes <SEQ ID 8029> which encodes the amino acid 
sequence <SEQ ID 8030>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

25 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1744 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2766 

A DNA sequence (GASxl417) was identified in S.pyogenes <SEQ ID 803 1> which encodes the amino acid 
sequence <SEQ ID 8032>. Analysis of this protein sequence reveals the following: 

40 Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 3 771 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

5 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2767 

A DNA sequence (GASxl419R) was identified in S.pyogenes <SEQ ID 8033> which encodes the amino 
acid sequence <SEQ ID 8034>. Analysis of this protein sequence reveals the following: 

10 Possible site: 13 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.93 Transmembrane 4 - 20 ( 1-25) 

15 Final Results 

bacterial membrane Certainty=0 . 5373 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2768 

25 A DNA sequence (GASxl423) was identified in S.pyogenes <SEQ ID 8035> which encodes the amino acid 
sequence <SEQ ID 8036>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>» Seems to have no N-terminal signal sequence 
30 INTEGRAL Likelihood = -8.97 Transmembrane 30 - 46 ( 25 - 49) 

INTEGRAL Likelihood = -7.80 Transmembrane 52 - 68 ( 50 - 72) 
INTEGRAL Likelihood = -6.95 Transmembrane 129 - 145 ( 125 - 146) 

Final Results 

35 bacterial membrane Certainty=0 . 4588 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

40 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2769 

A DNA sequence (GASxl426R) was identified in S.pyogenes <SEQ ID 8037> which encodes the amino 
45 acid sequence <SEQ ID 803 8>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
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>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -3.45 Transmembrane 36 - 52 ( 36 - 55) 



30 



5 Final Results 

bacterial membrane Certainty=0 .2381 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC39287 GB:AF115103 orf87 gp [Streptococcus thermophilus 
bacteriophage Sfi21] 
Identities = 43/73 (58%) , Positives = 61/73 (82%) 

15 

Query: 1 MINLKLRLQNKVTLMAILGAIFLLAQQLGIKLPSNIADIANTAvTLLVLLGVVTDPTTKG 60 

MIN KLRLQNK TL+A++ A+FL+ QQ G+ +P+NI + NT V +LV+LG++TDPTTKG 
Sbjct: 8 MINFKLRLQNKATLVALISAVFLMLQQFGLHVPNNIQEGINTLVGILVILGIITDPTTKG 67 

20 Query: 61 LSDSEQALTYHEP 73 

++DSE+AL+Y +P 
Sbjct: 68 IADSERALSYIQP 80 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
25 antigens for vaccines or diagnostics. 

Example 2770 

A DNA sequence (GASxl427R) was identified in S.pyogenes <SEQ ID 8039> which encodes the amino 
acid sequence <SEQ ID 8040>. Analysis of this protein sequence reveals the following: 

Possible site: 27 



>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -3.03 Transmembrane 2 - 18 ( 1 - 23) 



Final Results 

35 bacterial membrane Certainty=0 . 2211 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

40 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2771 

A DNA sequence (GASxl428R) was identified in S.pyogenes <SEQ ID 8041> which encodes the amino 
45 acid sequence <SEQ ID 8042>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 1017 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
5 antigens for vaccines or diagnostics. 

Example 2772 

A DNA sequence (GASxl429R) was identified in S.pyogenes <SEQ ID 8043> which encodes the amino 
acid sequence <SEQ ID 8044>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

10 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3097 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2773 

A DNA sequence (GASxl431R) was identified in S.pyogenes <SEQ ID 8045> which encodes the amino 
acid sequence <SEQ ID 8046>. Analysis of this protein sequence reveals the following: 

25 Possible site: 50 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 . 2584 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
35 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA98101 GB:M19348 hyaluronidase [Streptococcus pyogenes phage 
H4489A] 

Identities = 337/371 (90%) , Positives = 351/371 (93%) , Gaps = 1/371 (0%) 

40 Query: 1 MAENIPLRVQFKRMKAAEWASSDWLLEGEIGFETDTGFAKFGDGQNTFSKLKYLTGPKG 60 

M ENI PLRVQFKRM A EWA SDV+LLEGEIGFETDTGFAKFGDGQNTFSKLKYLTGPKG 
Sbjct: 1 MTENIPLRVQFKRMSADEWARSDVILLEGEIGFETDTGFAKFGDGQNTFSKLKyLTGPKG 60 

Query: 61 PKGDTGLQGKTGGTGSRGPAGKPGTTDYDQLQNKPDLGAFAQKEETNSKITKLESSKADK 120 
45 PKGDTGLQGKTGGTG RGPAGKPGTTDYDQLQNKPDLGAFAQKEETNSKITKLESSKADK 

Sbjct: 61 PKGDTGLQGKTGGTGPRGPAGKPGTTDYDQLQNKPDLGAFAQKEETNSKITKLESSKADK 120 

Query: 121 NAVYLKAESNAKLDEKljNLKGGvMTGQLQFKPN-SGIKPSSSVGGAINIDMSKSEGAAMV 179 
+AVY KAES +LD+KL+L GG++TGQLQFKPN SGIKPSSSVGGAINIDMSKSEGAAMV 
50 Sbjct: 121 SAVYSKAESKIELDKKLSLTGGIVTGQLQFKPNKSGIKPSSSVGGAINIDMSKSEGAAMV 180 
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Query: 180 ^TNKDTTDGPLMILRSNKDTFDQSVQFVDYKGTimVNIVMRQPTTPNFSSALNITSAN 239 

MYTNKDTTDGPLMILRS+KDTFDQS QFVDY G TNAVNI VMRQP+ PNFSSALNITSAN 
Sbjct: 181 ICfTNKDTTDGPLMILRSDKDTFDQSAQFVDYSGKTNAVNIVMRQPSAPNFSSAIiNITSAN 240 

5 Query: 240 EGGSAMQIRGVEKALGTLKITHENPSVDKEYDENAAALSIDIVKKQKGGKGTAAQGIYIN 299 

EGGSAMQIRGVEKALGTLKITHENP+V+ +YDENAAALS IDIVKKQKGGKGTAAQGI YIN 
Sbjct: 241 EGGSAMQIRGVEKALGTLKITHENPNVEAKYDENAAALSIDIVKKQKGGKGTAAQGIYIN 300 

Query: 300 STSGTAGKMLRIRNKNKDKFYVGPDGDFWSCaSSIVDGNLTVKDPTSGKHAATKDYVDEK 359 
10 STSGTAGKMLRIRNKN+DKFYVGPDG F S A+S V GNLTVKDPTSGKHAATKDYVDEK 

Sbjct: 301 STSGTAGKMLRIRNK^DKFWGPDGGFHSGANSTVAGNLTVKDPTSGKHAATKDYVDEK 360 

Query: 360 IAELKKLILKK 370 
IAELKKLILKK 
15 Sbjct: 361 IAELKKLILKK 371 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2774 

20 A DNA sequence (GASxl438R) was identified in S.pyogenes <SEQ ID 8047> which encodes the amino 
acid sequence <SEQ ID 8048>. Analysis of this protein sequence reveals the following: 

Possible site: 55 



25 



30 



>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1892 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence <SEQ ID 10439> was identified in GBS which encodes amino acid sequence 
<SEQ ID 10440>. 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

35 >GP:AAB18711 GB:U38906 0RF36 [Bacteriophage rlt] 

Identities = 70/111 (63%) , Positives = 88/111 (79%) 

Query: 1 LIEVIIKKYLDEHLDVPSFFEHQKDEPARFIILEKTSGAKQNHLLSSTFAFQSYAESLYE 60 
+IE+IIK +LD HL V SF E + + P +1+ EKT +K NHLLSSTFAFQSYA S+YE 
40 Sbjct: 1 MIEIIIKNFLDTHLSVSSFLEKKGEMPLSYILFEKTGSSKSNHLLSSTFAFQSYAPSMYE 60 

Query: 61 AALLNDKVKQVIEQLDVLPQVSGVHLNADYNFTDTATKRYRYQAVFDINHY 111 

AA LN+++K+V+E+L L ++S V LN+DYNFTDT TK YRYQAVFDINHY 
Sbjct: 61 AAKINEQLKEVVERLIELNEISNVSLNSDYNFTDTETKEYRYQAVFDINHY 111 

45 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2775 

A DNA sequence (GASxl442R) was identified in S.pyogenes <SEQ ID 8049> which encodes the amino 
50 acid sequence <SEQ ID 8050>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 1241 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

5 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

10 Example 2776 

A DNA sequence (GASxl444R) was identified in S.pyogenes <SEQ ID 805 1> which encodes the amino 
acid sequence <SEQ ID 8052>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

15 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4547 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
25 antigens for vaccines or diagnostics. 

Example 2777 

A DNA sequence (GASxl447R) was identified in S.pyogenes <SEQ ID 8053> which encodes the amino 
acid sequence <SEQ ID 8054>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

30 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

40 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2778 

A DNA sequence (GASxl448R) was identified in S.pyogenes <SEQ ID 8055> which encodes the amino 
acid sequence <SEQ ID 8056>. Analysis of this protein sequence reveals the following: 

45 Possible site: 20 
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>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3221 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

10 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2779 

A DNA sequence (GASxl449R) was identified in S.pyogenes <SEQ ID 8057> which encodes the amino 
acid sequence <SEQ ID 805 8>. Analysis of this protein sequence reveals the following: 

15 Possible site: 19 

>» Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 6356 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

25 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2780 

A DNA sequence (GASxl453R) was identified in S.pyogenes <SEQ ID 8059> which encodes the amino 
30 acid sequence <SEQ ID 8060>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0. 2869 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2781 

45 A DNA sequence (GASxl455R) was identified in S.pyogenes <SEQ ID 8061> which encodes the amino 
acid sequence <SEQ ID 8062>. Analysis of this protein sequence reveals the following: 
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Possible site: 40 

>>> Seems to have no N-terminal signal sequence 



5 Final Results 

bacterial cytoplasm Certainty=0 . 1787 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF43512 GB:AF145054 0RF19 [Streptococcus thermophilus 
bacteriophage 7201] 
Identities = 47/126 (37%) , Positives = 86/126 (67%) , Gaps = 2/126 (1%) 

15 

Query: 8 LKDLRNLDLYIASLIRRRDKIEASLL--SSPKWSSDKVNGGIKRKQDDVYVELIATAKDI 65 

++ ++ liD YI S I + ++E+ L +S +D V GG ++ +DD+YVELI +++ 
Sbjct: 7 IQQIKALDRYIESQIEQIKRLESQALKVTSGSMHTDMVQGGKRKGKDDIYVELITAREEV 66 

20 Query: 66 EKKTAEAIRKQRELQNLIDSLENTDSQTILSMVYIDKMTRWQVIDELNCSESTYFRLLRV 125 

E+ TAEAI+++ E + I ++E+ D++++L MVYID+++ WQ+ D++ S++TY+ LR 
Sbjct: 67 ERFTAEAIKQKLEFRRQ1ANIEDIDARSLLQMVYIDQLSIWQICDKMGISKATYYVKLRQ 126 

Query: 126 ATKELN 131 
25 A K L+ 

Sbjct: 127 AEKYLD 132 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

30 Example 2782 

A DNA sequence (GASxl456R) was identified in S.pyogenes <SEQ ID 8063> which encodes the amino 
acid sequence <SEQ ID 8064>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

35 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2883 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB18697 GB:U38906 ORF22 [Bacteriophage rlt] 
45 Identities = 78/207 (37%) , Positives = 123/207 (58%) , Gaps = 2/207 (0%) 

Query: 6 EIHRILGIDEVYKAPKRLTDILFDKDSREDIFRQFLKYETDVSYDWFMQYFEEEQADRKN 65 

+ + +L +DE R+ +++FDK RE+ + + L D+ D+F YF A 

Sbjct: 7 QFYDMLNVDEH1WFTNRIQELVFDKKGREEFYSKILNIHHDMGVDFFRDYFMAHSAVSA- 65 

50 

Query: 66 KKQDFTPKSVSTLLSKIISGNQYYEvA-VGlX^ILICAWQEQRIJTOSPFTYRPSKYWYHV 124 

K Q +TP + L + ++ G+ ++ GTG ++IQ WQ+ R+N F Y PS YWY 
Sbjct: 66 KGQHYTPDELGKLTALLVGGSGGADLTGAGTGTLIIQI<WQDDRMNTDFFNYLPSNYWYQA 125 

55 Query: 125 EELSDKAVPFLLFNMSIRGINGVVVHGDSLTRQVKNIYFLQNTKDDMLSFSDINvMPRTQ 184 

ELSD+A+ FL+ +IRG+NGW+HGD+L VK +YF+QN+ ++ + FS+INV+P ++ 
Sbjct: 126 LELSDEAISFLIHAFAIRGMNGWIHGDAtEMAVKQVYFIQNSANNPIGFSEINVIPHSK 185 
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Query: 185 DIEREFNVKEWIGDGIEHIENPLIEWI 211 

D + EW IEHIE+ +WI 

Sbjct: 186 DAMEFLGIHEWTEQAIEHIESKFPDWI 212 

5 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2783 

A DNA sequence (GASxl459R) was identified in S.pyogenes <SEQ ID 8065> which encodes the amino 
acid sequence <SEQ ID 8066>. Analysis of this protein sequence reveals the following: 

10 Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.44 Transmembrane 82 - 98 ( 81 - 98) 

15 Final Results 

bacterial membrane Certainty=0 . 1977 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2784 

25 A DNA sequence (GASxl460R) was identified in S.pyogenes <SEQ ID 8067> which encodes the amino 
acid sequence <SEQ ID 8068>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

»> Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0 . 3368 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

40 Example 2785 

A DNA sequence (GASxl461R) was identified in S.pyogenes <SEQ ID 8069> which encodes the amino 
acid sequence <SEQ ID 8070>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

45 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2834 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

5 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2786 

A DNA sequence (GASxl462R) was identified in S.pyogenes <SEQ ID 8071> which encodes the amino 
10 acid sequence <SEQ ID 8072>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N- terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 .3531 (Affirmative) < suco 

bacterial membrane Certainty= 0 . 0 0 0 0 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 No corresponding DNA sequence was identified in 5. agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2787 

25 A DNA sequence (GASxl463R) was identified in S.pyogenes <SEQ ID 8073> which encodes the amino 
acid sequence <SEQ ID 8074>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

>>> Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0 . 2483 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14569 GB:Z99117 similar to phage-related protein [Bacillus subtilis] 
Identities = 98/252 (38%) , Positives = 152/252 (59%) , Gaps = 29/252 (11%) 

40 

Query: 16 SPAvTOTOIEQWGARAEQFTTSLLSIISNNNLLAKATSESIMGAAMKAAVLNLPIEPSLG 75 

SP+V R E+V+G RA OFT S+LS+ ++ +L K S++ +AM AA L+LPI+ +LG 
Sbjct: 33 SPSVIKRFEEVLGKRATQFTASILSLYNSEQMLQKTDPMSVISSAMVAATLDLPIDKNLG 92 

45 Query: 76 FAYWPYNRNYKDGNRWITVNEAQFQIGYRGLIQLAQRSGQVRNIEHGIIYEEEFLGYDK 135 
+A++VPY +AQFQ+GY+G IQLA R+GQ ++I I+E E ++ 

Sbjct: 93 YAWIVPYG GKAQFQLGYKGYIQLALRTGQYKSINCIPIHEGELQKWNP 140 

Query: 136 IRGQLKLTGDYvBSGVVKGYFASLELISGFYKMIFWPKEKVYEHAKKYSKTFDKKTGDFK 195 
50 + ++++ + +S V GY A ELI+GF K ++W K +V +H KK+SK+ DF 

Sbjct: 141 LTEEIEIDFEKRESDAVIGYA&YFELINGFRKTVYWTKAQVEKHKKKFSKS DF- 193 
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Query: 196 PGTPWATEFDPMAIKTLLKELLSKYAPLSVEMQDA-LEADNADSTIVIPKDVTPQETNSL 254 

W ++D MA+KT+LK +LSK+ LSVEMQ A +E D I D+T + +S 
Sbjct: 194 ---GWHTOVTOAMALKTVLKAVLSKWGILSVEMQKAVIEEDETRERI DITNEADSS- 245 

5 

Query: 255 DDLIGTQNEKKD 266 

++I ++ KD 
Sbjct: 246 -EIIDSEPSNKD 256 

10 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2788 

A DNA sequence (GASxl464R) was identified in S.pyogenes <SEQ ID 8075> which encodes the amino 
acid sequence <SEQ ID 8076>. Analysis of this protein sequence reveals the following: 

15 Possible site: 30 

>>> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 .4258 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

25 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2789 

A DNA sequence (GASxl465R) was identified in S.pyogenes <SEQ ID 8077> which encodes the amino 
30 acid sequence <SEQ ID 8078>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>>> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0. 2045 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2790 

45 A DNA sequence (GASxl469R) was identified in S.pyogenes <SEQ ID 8079> which encodes the amino 
acid sequence <SEQ ID 8080>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
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>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

5 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

10 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2791 

A DNA sequence (GASxl470R) was identified in S.pyogenes <SEQ ID 8081> which encodes the amino 
acid sequence <SEQ ID 8082>. Analysis of this protein sequence reveals the following: 

15 Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 .3577 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
25 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC98430 GB:L29324 excisionase [Streptococcus pneumoniae] 
Identities = 23/56 (41%) , Positives = 41/56 (73%) 

Query: 23 KHLIQQWEGLTVATAKQWATEMRDHPDFKQFVLNPTHRIVFIDYKGFKLFVQWKSR 78 
30 K ++++W+GL T +W EMR++ F +V+NPTH++VFI+ +GF+ F++WK + 

Sbjct: 19 KGILKRWDGLNKYTLNRWIKEMRENRTFSMYVINPTHKLVFINLEGFESFLRWKQK 74 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

35 Example 2792 

A DNA sequence (GASxl473) was identified in S.pyogenes <SEQ ID 8083> which encodes the amino acid 
sequence <SEQ ID 8084>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

40 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2725 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
50 antigens for vaccines or diagnostics. 
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Example 2793 

A DNA sequence (GASxl476) was identified in S.pyogenes <SEQ ID 8085> which encodes the amino acid 
sequence <SEQ ID 8086>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

5 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1422 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2794 

A DNA sequence (GASxl480R) was identified in S.pyogenes <SEQ ID 8087> which encodes the amino 
acid sequence <SEQ ID 8088>. Analysis of this protein sequence reveals the following: 

20 Possible site: 25 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -4.04 Transmembrane 291 - 307 ( 290 - 309) 

25 Final Results 

bacterial membrane Certainty=0. 2614 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

30 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2795 

35 A DNA sequence (GASxl489R) was identified in S.pyogenes <SEQ ID 8089> which encodes the amino 
acid sequence <SEQ ID 8090>. Analysis of this protein sequence reveals the following: 

Possible site: 23 



40 



45 



>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 227 8 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2796 

A DNA sequence (GASxl490R) was identified in S.pyogenes <SEQ ID 8091> which encodes the amino 
5 acid sequence <SEQ ID 8092>: 

SFITSVIAFRKLLKCEGIDLYL^GDLMTCFEQLLTQLKD^ 
QPYKVFTPYYRIWQNYPKETPIKVELSQGRWLNLETPDDVL^ 

SPFLRIGAIGIRTVYHAVRQAPNSLGQATFLKEIAWRDFYbMVYVAYPDQKTQPIQKAFSQIEWV^ 

QKTGWMHNRLRMIVASFLTKDLLCDWRLGEQYFQQQLIDYDAASNIGGWQWAASTGTDAVPYFRIFNPVTQGKRFDPKGEFIKAYLPQLEH 
1 0 VPEKYLHEPWKMPKNLQESVSCIIGTDYPQPIVDHAKQREQAIAKYEWAKEKAKIE 

Analysis of this protein sequence reveals the following: 

Possible site: 33 

15 >>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA22361 GB:M94110 DNA photolyase [Bacillus firmus] 
25 Identities = 175/338 (51%) , Positives = 228/338 (66%) , Gaps = 6/338 (1%) 

Query: 145 EIINQSGQPYKVFTPYYRIWQNYPKETP- -IKVELSQGRWLNLETPDDVLRTVES- -FKD 200 

+++ + G PYKVFTPYY+ W K TP IK ++ G PD T+ + K 

Sbjct: 2 QvLKJOJGTPYKVFTPYYKAWAKERKRTPAVIKRDVLLGSvHKGTAPDREAETLFNNLIKK 61 

30 

Query: 201 EKYQDVATFDE-ASKQLNRFIQDQLAAYHANRDFPAQLGTSRLSPFLRIGAIGIRTVY-H 258 

Y A +E A K+L F + +L+ Y ANRDFP+ GTSRLSP+++ GA+ R++Y H 
Sbjct: 62 CSYDWSAIGEEHAIKRLQMFTKKRLSGYKANRDFPSITGTSRLSPYIKTGAVSSRSIYYH 121 

35 Query: 259 AWQAPNSLGQATFLKEIiAWRDFYNMVYVAYPDQKTQPIQKAFSQIEWVNNPDWFQLWKE 318 

+ +S TFLKELAWRDFY MV+ PD K + I + + ++ W ++ D WK 
Sbjct: 122 ILNAEADSYSAETFLKEIaAWRDFYRMVHFYEPDCKDREIMEGYRELNWSHDQDDLTSWKR 181 

Query: 319 GKTGYPIVDAAMLQLQKTGWMHNRLRMIVASFLTKDLLCDWRLGEQYFQQQLIDYDAASN 378 
40 G+TG+PIVDA M QL GWMHNRLRMI ASFLTKDLL DWRLGE+YF++ LIDYD +SN 

Sbjct: 182 GETGFPI VDAGMRQLLNEGWMHNRLRMITASFLTKDLLIDWRLGERYFERMLIDYDPSSN 241 



Query: 379 IGGWQWAASTGTDAVPYFRIFNPVTQGKRFDPKGEFIKAYLPQLEHVPEKYLHEPWKMPK 438 
IGGWQWAAS GTDAVPYFRI FNP VTQ KRFD G +1+ Y+P+L HVP+ Y+HEPWKM + 
45 Sbjct: 242 IGGWQWAASVGTDAVPYFRIFNPVTQSKRFDENGTYIRTYIPEIiNHVPDHYIHEPWKMSE 301 



50 



Query: 439 NLQESVSCI IGTDYPQPI VDHAKQREQAIAKYEWAKEK 476 

Q C + DYP PIVDH+KQR++A++ ++ E+ 
Sbjct: 302 EEQVKYKCRLDEDYPLPIVDHSKQRKKALSFFKGDDEE 339 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 



Example 2797 



55 



A DNA sequence (GASxl493R) was identified in S.pyogenes <SEQ ID 8093> which encodes the amino 
acid sequence <SEQ ID 8094>. Analysis of this protein sequence reveals the following: 
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Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

5 Final Results 

bacterial cytoplasm Certainty=0. 2748 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2798 

15 A DNA sequence (GASxl501R) was identified in S.pyogenes <SEQ ID 8095> which encodes the amino 
acid sequence <SEQ ID 8096>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have a cleavable N-term signal seg. 
20 INTEGRAL Likelihood = -7.27 Transmembrane 64 - 80 ( 53 - 83) 

Final Results 

bacterial membrane Certainty=0 .3909 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC95443 GB:AF068901 YlmG [Streptococcus pneumoniae] 
30 Identities = 35/81 (43%) , Positives = 58/81 (71%) 

Query: 1 MILILSILLRLIKVYTYLLIAYALMSWFPGAYDSKIGRLISGIVEPILKPFRAFNLQFAG 60 

MI ++ ++ + +Y+ +L+A+A+MSWFPGAY+S +GR I +V+P+L P + LQ AG 
Sbjct: 1 MIFLIRMIYNAVDIYSLILVAFAVMSWFPGAYESSLGRWIVALVKPVLAPLQRLPLQIAG 60 

35 

Query: 61 LDFTIFWIISLNFLAQVLVR 81 

LD +++V 1+ + FL + LVR 
Sbjct: 61 LDLSVWVAIVLVRFLGENLVR 81 

40 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2799 

A DNA sequence (GASxl502) was identified in S.pyogenes <SEQ ID 8097> which encodes the amino acid 
sequence <SEQ ID 8098>. Analysis of this protein sequence reveals the following: 

45 Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.39 Transmembrane 17 - 33 ( 17 - 33) 



50 



Certainty=0. 1956 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2800 

A DNA sequence (GASxl507) was identified in S.pyogenes <SEQ ID 8099> which encodes the amino acid 
sequence <SEQ ID 8100>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0865 (Af f irmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2801 

A DNA sequence (GASxl511R) was identified in S.pyogenes <SEQ ID 8101> which encodes the amino 
acid sequence <SEQ ID 8102>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.83 Transmembrane 31 - 47 ( 22 - 53) 
INTEGRAL Likelihood = -0.96 Transmembrane 2 - 18 ( 1 - 18) 

Final Results 

bacterial membrane Certainty=0. 5734 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2802 

A DNA sequence (GASxl516R) was identified in S.pyogenes <SEQ ID 8103> which encodes the amino 
acid sequence <SEQ ID 8104>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 2729 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA96472 GB:AB036428 Dpr [Streptococcus mutans] 
Identities = 132/175 (75%) , Positives = 153/175 (87%) 

Query: 1 MTNTLVENIYASOTHNISKlCEASKNEKTKAVIJtfQ^ 60 

MTNT+ ENIYAS+ H + KKE S NEKTKAVLNQAVADLS AAS I VHQVHWYMRG GFLY 
Sbjct: 1 MTNTITENIYASIIHQVEKl(ENSGNEKTKAv™QAVADLSKAASIvHQvHWYMRGSGFLY 60 

15 Query: 61 LHPKMDELLDSLNANLDEMSERLITIGGAPYSTLAEFSKHSKLDEAKGTYDKTVAQHLAR 120 

LHPKMDEL+D+LN +LDE+SERLITIGGAP+STL EF ++S+L+E GT+DK++ HL R 
Sbjct: 61 LHPKMDELMDALNGHLDEISERLITIGGAPFSTLKEFDENSRLEETVGTWDKSITDHLKR 120 

Query: 121 LVEVYLYLSSLYQVGLDITDEEGDAGTNDLFTAAKTEAEKTIWMLQAERGQGPAL 175 
20 LV+VY YLSSLYQVGLD+TDEE DA +ND+FTAA+TEA+ KT IWMLQAE GQ P L 

Sbjct: 121 LVQVYDYLSSLYQVGLDVTDEEDDAVSNDIFTAAQTEAQKTIWMLQAELGQAPGL 175 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

25 Example 2803 

A DNA sequence (GASxl517) was identified in S.pyogenes <SEQ ID 8105> which encodes the amino acid 
sequence <SEQ ID 8106>. Analysis of this protein sequence reveals the following: 

Possible site: 46 
30 >>> Seems to have an uncleavable N-term signal seq 



35 



Final Results 

40 bacterial membrane Certainty=0 . 3527 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
45 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA96471 GB:AB036428 type IV prepilin peptidase homologue 
[Streptococcus mutans] 
Identities = 55/127 (43%) , Positives = 78/127 (61%) , Gaps = 3/127 (2%) 

50 Query: 83 VSASYCYLLLFSLLFSLFDWRSQEYPFILWLFSFVSLLLFYSINYLSLILLLLGLIAHLR 142 

++ S LL +L SL+D + Q YP LW+ L+ Y +N +SLIL L G+ A L+ 

Sbjct: 91 LTTSQVCLLFMGVLLSLYDLQDQSYPLTLWIGFTFLLMFIYPLNLISLILFLFGIFAALK 150 

Query: 143 PFSIGAGDFFYLASLALVLDLTSLIWLIQIASIAGITACLLLGIKRIP--FIPYLSFGLF 200 
55 +IG+GDFFYLA+LAL L+L +IW+IQ+ASL GI LL + p F+P+L G 

Sbjct: 151 NINIGSGDFFYLATLALSLNLQQIIWIIQIASLLGILYSLLFQKHKEPFAFVPFLFLG-H 209 
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I++ H 
Sbjct: 210 LIIIFSH 216 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2804 

A DNA sequence (GASxl538R) was identified in S.pyogenes <SEQ ID 8107> which encodes the amino 
acid sequence <SEQ ID 8108>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1186 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2805 

A DNA sequence (GASxl539R) was identified in S.pyogenes <SEQ ID 8109> which encodes the amino 
acid sequence <SEQ ID 81 10>. Analysis of this protein sequence reveals the following: 

25 Possible site: 34 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.73 Transmembrane 6 - 22 ( 3-32) 

30 Final Results 

bacterial membrane Certainty=0 . 5692 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF31453 GB:AF221126 putative histidine kinase [Streptococcus pneumoniae] 
Identities = 141/301 (46%) , Positives = 210/301 (68%) , Gaps = 7/301 (2%) 

40 Query: 1 MKRYPLLVQLISYVFVIVIALITTLGLLYYQTSSRNIRQLIERDTRQSIRQSSQFIDAYI 60 

MKR LLV+++ +F++ + L+ +G YYQ+SS I IE +++ +1 Q+S FI +YI 
Sbjct: 1 MKRSSLLVRMVISIFLVFLILLALVGTFYYQSSSSAIEATIEGNSQTTISQTSHFIQSYI 60 

Query: 61 KPLKETTSVIAKNTEIQAFASQIHQENDKQvLQLMKMvTATO 120 
45 K L+ T++ L + T++ A+A Q+ + + L +L ++ DL+ VLVTK G+ +ST 

Sbjct: 61 KKLETTSTGLTQQTDVIAYAENPSQDKVEGIRDLFLTILKSDKDLKTVVLVTKSGQVIST 120 

Query: 121 NSQLTMKTSSDMMAEPWYKARIDRCAMPILTPARQLSLSSKKEWWSVTQEVVDRAGHNL 180 
+ + MKTSSDMMAE WY+ AI + AMP+LTPAR+ S +WV+SVTQE+VD G NL 
50 Sbjct: 121 DDSVQMKTSSDMMAEDWYQKAIHQGAMPVLTPARK SDSQWVI SVTQELVDAKGANL 176 

Query: 181 GVIjRLDIAYPTIKASLDQLQLGRQGFAFIvNDKHEFvYHPKKSWSSSKEMAAMKPYLAI 240 
GVLRLDI+Y T++A L+QLQLG+QGFAFI+N+ HEFVYHP+ +VYSSS +M AMKPY+ 
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Sbjct: 177 GVLRLDISYETLEAYLNQLQLGQQGFAFIINENHEFVYHPQHTVYSSSSKMEflMKPYIDT 236 

Query: 241 QNGYTKDKTSFVYQKLIPNSQWTLVGVASLDQLHRVQRQIFWSFSWNRASTLSDLWLCNCL 301 
GYT S+V Q+ I + WT++GV+SL++L +V+ Q+ W+ ++++ L +C CL 

5 Sbjct: 237 GQGYTPGHKSYVSQEKIAGTDWTVLGVSSLEKLDQVRSQLLWTL LGASVTSLLVCLCL 294 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2806 

10 A DNA sequence (GASxl542R) was identified in S.pyogenes <SEQ ID 8111> which encodes the amino 
acid sequence <SEQ ID 81 12>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> May be a lipoprotein 

15 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC23101 GB:D32823 conserved hypothetical protein [Haemophilus influenzae Rd] 
Identities = 56/128 (43%) , Positives = 87/128 (67%) 

25 

Query: 73 DFELKGIDGKTYRLSEFKGKKVYLKFWASWCSICLSTIJUJTEDLAKMSDKDYWLTVVSP 132 

D +LK ++ + LS++KGK VY+K WASWC ICL+ LA+ +DL+ D+++ V+T+VSP 
Sbjct: 24 DVQLKDLNNQPTOLSQYKGKPVYVKMWASWCPICLAGLAEIDDLSAEKDRNFEVITIVSP 83 

30 Query: 133 GHQGEKSEADFKKWFQGTDYKDLPVLLDPDGKLLEAYGVRSYPTEVFIGSDGVLAKKHIG 192 

H+GEK ADF +W++G +YK++ VLLD G++++ VR YP +F+ SD L K G 
Sbjct: 84 DHKGEKDTADFIEWYKGLEYKNITVXiLDEKGEIIDKARVRGYPFNLFLDSDLNLKKTVPG 143 

Query: 193 YAKKSDIK 200 
35 + 1+ 

Sbjct: 144 HLGAEQIR 151 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

40 Example 2807 

A DNA sequence (GASxl543R) was identified in S.pyogenes <SEQ ID 8113> which encodes the amino 
acid sequence <SEQ ID 8114>. Analysis of this protein sequence reveals the following: 

Possible site: 13 
45 >>> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial membrane -• 
bacterial outside -- 
bacterial cytoplasm -■ 



- Certainty=0. 4100 (Affirmative) < suco 

- Cerfcainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC23102 GB:U32823 cytochrome C-type biogenesis protein 
5 [Haemophilus influenzae Rd] 

Identities = 106/224 (47%) , Positives = 138/224 (61%) , Gaps = 16/224 (7%) 

Query: 6 VLMVSVFGAGLLSFFSPCIFPVLPVYLGILLDADDSKTITIFGKKLYWYGIVKTLAFIFG 65 
+L+ +VF AGL SF SPCIFP++P+Y GIL GKK ++ T FI G 

10 Sbjct: 6 LLIGTVFLAGLASFLSPCIFPIIPIYFGILSKG GKK VLNTFLFILG 51 

Query: 66 LSTIFVILGYGAGFLGNILYAVWFRYLLGALVIILGIHQMGLITIKSLQFQKSLTFHNNK 125 

LS FV LG+ GFLGNIL++ R + G +VIILGIHQ+G+ I L+ K + + 
Sbjct: 52 LSLTFVSLGFSFGFLGNILFSNTTRIIAGVIVIILGIHQLGIFKIGLLERTKLVEIKTSG 111 

15 

Query: 126 ISTRNGLFNAFILGLTFSFGWTPCTGPVIjSSVIALVASGGNGAWQGGVLMIIYTLGLGIPFL 185 

h AF+LGLTFS GWTPC+GP+L+SVLAL G+ A G +M +Y LGL PF+ 
Sbjct: 112 KSTAL-EAFVLGLTFSLGWTPCIGPIIASVIALSGDEGS-ALYGASMMFVYVLGIiATPFV 169 

20 Query: 186 LISFASGIVLKQFNKLKPHILLLKKVGGVLIIVMGILLMTGTLN 229 

L SF S +LK+ L H+ K GG+LIIVMGILL+T + 
Sbjct: 170 LFSFFSDSLLKRAKGLNKHLDKFKIGGGILIIVMGILLITNMFS 213 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
25 antigens for vaccines or diagnostics. 

Example 2808 

A DNA sequence (GASxl544) was identified in S.pyogenes <SEQ ID 81 15> which encodes the amino acid 
sequence <SEQ ID 81 16>. Analysis of this protein sequence reveals the following: 

Possible site: 25 



30 



>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1493 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

40 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2809 

A DNA sequence (GASxl546R) was identified in S.pyogenes <SEQ ID 81 17> which encodes the amino 
acid sequence <SEQ ID 81 18>. Analysis of this protein sequence reveals the following: 

45 Possible site: 46 

>>> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0 .4658 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04061 GB:AP001508 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 48/89 (53%) , Positives = 61/89 (67%) 

Query: 1 MMVLOTYDVNTETPAGRKRLRHVAKLCVDYGQRVQNSvPECSOTPAEFVDIKHRLTQIID 60 

M+VL+TYDV T + G KRLR VAK C +YGQRVQNSVFEC V + +K LT +ID 
Sbjct: 1 MLVLITYDVQTSSMGGTKRLRKVAKACQNYGQRVQNSVFECIVDSTQLTSLKLELTSLID 60 

Query: 61 EKTDSIRFYLLGKNWQRRVETLGRSDSYD 89 

E+ DS+R Y LG N++ +VE +G S D 
Sbjct: 61 EEKDSLRIYRLGNNYKTKVEHIGAKPSID 89 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2810 

A DNA sequence (GASxl547R) was identified in S. pyogenes <SEQ ID 8119> which encodes the amino 
acid sequence <SEQ ID 8120>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have no N-terrainal signal sequence 

INTEGRAL Likelihood = -1.70 Transmembrane 44 - 60 ( 43 - 60) 



RGD motif: 330-332 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04060 GB:AP001508 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 162/341 (47%) , Positives = 231/341 (67%) , Gaps = 1/341 (0%) 

Query: 1 MKKLLNTLYLTQEDFYVTKEGDNIVIKQEGKVLKRFPFRIIDGIVCFSYLGVSSALVKLC 60 

MKKLLNTLY+TQ D Y++ +GDN+V+ +E + L R P ++ IV F Y G S AL+ C 
Sbjct: 1 MKKLIiOTLYVTQPDTYLSLDGDNvVLLKEQEKLGRLPLHNLEAIVGFGYT>FFATURESALMGYC 60 

Query: 61 TENQINLSFHTPQGRFCGRYIGSTNGNVLLRREHYRLSDRE-ESLEYAKRFILAKISNSR 119 

E I+++F T GRF R +G + GNV+LR+ YR+S+ + ES + A+ FI K+ NS+ 
Sbjct: 61 AERNISITFLTKNGRFIJ^WGESRGNvVLRKTQYRISENDQESTKIARNFITGKVYNSK 120 

Query: 120 KYLLRFKRDHRQQIDTKLFFAVNDELIWALEWQAftDNKDSLRGIEGQAANQYFRIFNDL 179 

L R R+H +++ + F+A + L ++ ++ D+ +SLRG EGQAA Y ++F+ + 
Sbjct: 121 WMLERMTREHPLRVNVEQFKATSQLLSVMQEIRNCDSL 180 

Query: 180 VLTDKKTFYFQGRSKRPPLDCVNALLSFGYSLLTFECQSALEAVGLDSYVGFFHTDRPGR 239 

+L K+ F F GRS+RPP D VNA+LSF Y+LL + +ALE VGLD+YVGF H DRPGR 
Sbjct: 181 ILQQKEEFAFHGRSRRPPKDNWA^SFAYTLIAITOVAAALETVGLDAYVGFMHQDRPGR 240 

Query: 240 ASIALDLVEEFRSYIVTJRFVFSLINKGQLQKKHFEVKENGSILLTENGRAIFIDLWQKRK 299 

ASLALDL+EE R DRFV SLIN+ ++ F KENG++L+T+ R F+ WQ +K 
Sbjct: 241 ASIALDLMEELRGLYADRFVLSLINRKEMTADGFYKKENGAVLMTDEARKTFLKAWQTKK 300 

Query: 300 HTEVEHPFTKEKVKLMLLPYVQAQLLAKAIRGDLESYPPFM 340 

++ HP+ EK+ L+PYVQA LLA+ +RGDL+ YPPF+ 
Sbjct: 301 QEKITHPYLGEKMSWGLVPYVQALLLARFLRGDLDEYPPFL 341 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 1680 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2811 

5 A DNA sequence (GASxl548R) was identified in S.pyogenes <SEQ ID 8121> which encodes the amino 
acid sequence <SEQ ID 8122>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0 . 2247 (Affirmative) < suoo 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04059 GB:AP001508 unknown [Bacillus halodurans] 
Identities = 90/169 (53%), Positives = 111/169 (65%), Gaps = 1/169 (0%) 

20 

Query: 45 LHTKADNPyiKEKRKELLVSRAMPISSAELGLSGIMDWEFYKDDQGVSLRGKRGKWLPK 104 

+H KAD P++KEKR L RAMPI S L +SGI DWEF +D +G+ L G G + 
Sbjct: 1 MHKKADQPFMKEKRGSKLTvRAMPIQSKNLQISGICDVVEFVQDSEGIELSGVSGSYKAF 60 

25 Query: 105 VVEyKRGKPKKDTRDIVQLVAQTMCLEETLDCDINEGCLYYHSVNQRVIVPMTSALRQEV 164 

VEYKRGKPKK DIVQLVAQ MCLEE L C I++G L+Y+ + RV VP+T ALR +V 
Sbjct: 61 PVEYKRGKPKKGDEDIVQLVAQAMCLEEMLVCKIDKGYLFYNEIKHRVEVPITDALRDKV 120 

Query: 165 KELAAEMHEVYQSQMLPKAAYFKNCQLCSLVDICKPRLSKKTRSVSRYI 213 
30 ++A EMH Y+++ PK C CSL IC P+L K RSV RYI 

Sbjct: 121 VQMAKEMHHYYENRHTPKVKTGPFCNNCSLQSICLPKLMNK-RSVKRYI 168 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

35 Example 2812 

A DNA sequence (GASxl549R) was identified in S.pyogenes <SEQ ID 8123> which encodes the amino 
acid sequence <SEQ ID 8124>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1399 (Affirmative) < succ> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04058 GB:AP001508 unknown conserved protein in others 
50 [Bacillus halodurans] 

Identities = 148/290 (51%) , Positives = 190/290 (65%) , Gaps = 19/290 (6%) 

Query: 6 MLEHKIDFIWTLE VTCEANANGDPLNGNMPRTDAKGYGVMSDVS I KRKIRNRLQDMGKS I F 65 
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+L+HKIDF V L V +AN NGDPLNGN PR + G+G +SDV+IKRKIRNRL DM + IF 



Sb j ct : 


3 


ILDHKIDFAVILSWKANPNGDPLNGNRPRQNYDGHGEISDVAIKRKIRNRLLDMEEPIF 


62 


Query: 


66 


VQANERIEDDFRSLEKRFSQH FTAKTPDKEIEEKANAL WFDVRAFGQVFTYLK 


118 






VQ+++R D F+SL R + K + ++E A W DVR+FGQVF + 




Sb j ct : 


63 


VQSDDRKADSFKSLRDRADSNPELAKMLKAKNASVDEFAKIACQEWMDVRSFGQVFAFKG 


122 


Query: 


119 


K--SIGVRGPVSISMAKSLEPIVISSLQITRSTNGMEAKNNSGRSSDTMGTKHFVDYGVY 


176 






S+GVRGPVSI A S++PI I S QIT+S N + RSSDTMG KH VD+GVY 




Sb j ct : 


123 


SNLSVGVRGPVSIHTATSIDPIDIVSTQITKSVNSVTGDK RSSDTMGMKHRVDFGVY 


179 


Query: 


177 


VLKGSINAYFAEKTGFSQEDAEAIKEVLVSLFEHDASSARPEGSMRVCEVFWFTHSSKLG 


236 






V KGSIN AEKTGF+ EDAE IK L++LFEND+SSARP+GSM V +V+W+ HSSKLG 




Sb j ct : 


180 


VFKGSINTQLAEKTGFTNEDAEKIKRALITLFENDSSSARPDGSMEVHKVYWWEHSSKLG 


239 


Query: 


237 


WSSARVFDLLEYHQSIEEKSTYDAYQIHLNQEKLAKYEAKGLTLEILEG 286 








SSA+V L+ + ++D Y + L YE GL +E+++G 




Sb j ct : 


240 


QYSSAKVHRSLKIESKTDTPKSFDDYAVEL YELDGLGVEVIDG 282 





Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2813 

A DNA sequence (GASxl550R) was identified in S, pyogenes <SEQ ID 8125> which encodes the amino 
acid sequence <SEQ ID 8126>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2882 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04057 GB:AP001508 unknown [Bacillus halodurans] 
Identities = 176/671 (26%), Positives = 311/671 (46%), Gaps = 87/671 (12%) 



Query: 


1 


MDFFTSLLKTYEKAELADLVDHQKR- -NNEPVLLPI YHTSLKSNGKNI I SVKLDKDGQFH 


58 






M + L +TYE A h + K+ + E LLPI HT+ ++ IV LD+DG F 




Sbjct: 


1 


MSWLLHLYETYE - ANLDQVGKTVKKGEDREYTLLPI SHTTQNAH IEVTLDEDGDFL 


55 


Query: 


59 


KAEFMADKQMI I FPVTADSVARSGSHPAPHPLVDKFAYYSAEM GQIQ YDS 


108 






+A+ + K+ + P T ++ +RSGS AP+PL DK +Y + + G+I+ +D+ 




Sb j ct : 


56 


RAKALT-KESTLIPCTEEAASRSGSKVAPYPLHDKLSYVAGDFVKYGGKIKNQDDAPFDT 


114 


Query: 


109 


FHKQLNNWID--YCEEGDVKKFLTFVQQFILKPEFLTLILDSLIGPDYQHNQLKVTFCDA 


166 






+ KL W+ Y E VK T++++ L+++L NQ+ + 




Sbjct: 


115 


YIKNLGEWANSPYATE-KVKCIYTYLKKGRLIEDLVDAGVLKL DENQQLIEKWEK 


168 


Query: 


167 


TGKEKLIDLSACFLEFSIDQ FQGFKNESVSTF KALHQSYI SFVEANRENLG 


217 






+EL+AF +DQ FF ES+ K + S+ISF 




Sbjct: 


169 


RYEELLGEKPAIFSSGATDQASAFVRFNVFHPESIDDVWKDKEMFDSFISFYNDKLGEED 


228 


Query: 


218 


I CNI SGREEQLTDKH RGLMGNAKIISVS-NKREAYKGRFREREDVFSVGYETSEKI 


272 






IC ++G T++H R AK+IS + N ++GRF+ + + YE S+K 




Sb j ct : 


229 


ICFVTGNRLPSTERHANKIRHAADKAKLISANDNSGFTFRGRFKTSREAVGISYEVSQKA 


288 


Query: 


273 


HLMLKYLLENKNTSTWLGSSQYLINWFSDD-LTNDSRLDIVSPIFDDGLEEDDDDDTPPV 


331 






H LK+L+ ++ S + + W +D+ L + DV+ E + DDT + 




Sb j ct : 


289 


HNALKWLIHRQSKSI DDRVFLVWSNDNSLVPNPDEDAVDIMKHANRELERDPDTGQI 


345 
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Query: 


332 


ITLATEDNKRIGKSFIKGQKLFANDATY YVAIMIKTSNGRIALKYFRQLQASQLLT 


387 






A E K IG + +D Y ++ +L+ + GR+A+ Y+R L L 




Sb j ct : 


346 




395 


Query: 


388 


NLNKWQETYSWESRSKFGKSRLRT PTFHDILOTSYGVDRDRFLELDNDNFKSDQIQ 


443 






L W ++ +WE R + + + P DI +YG ++ D ++ 




Sb j ct : 


396 


RLEAWHDSCAWEHRYRRDEKEFISFYGAPATKDIAFAAYGPRA SEKVIKDLME 


448 


Query: 


444 


KLVASLIDGKPMPQSIVKKL- - -GNNVKERHRYRKHWYQVEQVCLAILHK- - -QNGEEFS 


497 






+++ ++DG+ +P+ IV+ +N R+ W + + A++ K + EE+ 




Sb j ct : 


449 


RMLPCIVDGRRVPKDIVRSAFQRASNPVSMERWE--WEKTLSITCALIRKMHIEQKEEWG 


506 


Query: 


498 


PMLDHTNQNRSYLFGRLIAIFEL1ETLRYGI£)GNNNDRITNAERYWTAYTGQPTKLMMLL 


557 






LD ++ +RSYLFGRLLA+ +++E G G + R TMA RY +Y+ P + + 




Sb j ct : 


507 


VPLDKSSTDRSYLFGRLLAVADVLER GALGKDETRATNAIRYMNSYSKNPGRTWKTI 


563 


Query: 


558 


ENKIKPYEEPLKLNRRGSWMKLEKEKEEILEIiLNPLLETETMEKPLDYRFIFGYYAEKNY 


617 






+ ++PY+ KL + ++ L K +EI + P + PL +++ G+Y+++ 




Sbjct: 


564 


QESLQPYQ- - AKLGTKATY - -LSKLVDEIGDQFEP- - -GDFNNNPLTEQYLLGFYSQRRE 


616 


Query: 


618 


YYTKQNTEVTE 628 








Y K+ E + 




Sbjct: 


617 


LYKKKEEETNQ 627 





25 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2814 

A DNA sequence (GASxl551R) was identified in S.pyogenes <SEQ ID 8127> which encodes the amino 
30 acid sequence <SEQ ID 8128>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have no N- terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 3035 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04056 GB:AP001508 unknown [Bacillus halodurans] 
Identities = 90/218 (41%) , Positives = 127/218 (57%) , Gaps = 7/218 (3%) 

45 Query: 13 GQRALFTNPATKGGSERSSYSVPTRQALNGIVDAIYYKPTFTNIVTEVKVINQIQTELQG 72 

G ALFT+P TK G E+ SYSVPT QAL GI ++IY+KPT ++ E++V+ IQ E +G 
Sbjct: 11 GDYALFTDPLTKIGGEKLSYSVPTYQALKGIAESIYWKPTIVFVIDELRVMKPIQMESKG 70 

Query: 73 VRALLHDYSADLSYVSYLSDVVYLIKFHFVWNEDRKDLNSDRLPAKHEAIMERSIRKGGR 132 
50 VR + + L++ +YL DV Y +K HF +N R DL DR KH +I++RS++ GGR 

Sbjct: 71 TOPIEYGGGNTIAHYTYLKDVHYQVKAHFEFNLHRPDLAFDRNEGKHYSILQRSLKAGGR 130 

Query: 133 RDVFLGTRECLGLVDDISQEEYETTVSYYNGV-NIDLGIMFHSFAYPKDK-KTPLKSYFT 190 
RD+FLG REC G V + E+ + +Y+G LGMHFYP+ + L 

55 Sbjct: 131 RDIFLGARECQGYV APCEFGSGDGFYDGQGKYHLGTMVHGFNYPDETGQHQLDVRLW 187 



60 



Query: 191 KTVMKNGVITFKAQSECDI VNTLSSYAFKA- - PEEIKS 226 

VM+NG IF +C IV + K P+ ++S 

Sbjct: 188 SAVMENGYIQFPRPEDCPIVRPVKEMEPKIFNPDNVQS 225 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2815 

A DNA sequence (GASxl552R) was identified in S.pyogenes <SEQ ID 8129> which encodes the amino 
acid sequence <SEQ ID 8130>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2770 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB0405S GB:AP001508 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 252/836 (30%) , Positives = 404/836 (48%) , Gaps = 90/836 (10%) 



Query: 


3 


MIIAHYDCKKDKKQSLDEHLWHVACSSRQEASIIGQGDVLFLIGLYHDLGKADRTFQD-- 


60 






M +AH Q+L EHL VC+ ++ VLGL HDLGK F+D 




Sb j ct : 


1 


MYIAHIREVDKVIQTLKEHLCGVQCIAETFGRKIiRLQHVAGIiAGLLHDLGKYTNEFKDYI 


60 


Query: 


61 


KLLNNPNRHVDHSYAGAKYLCSI IGPHLKNRGVDKNERMTFNEMVGYVI SAHH 


113 






+L VDHS AG + L + L +R +E++ E+VG I +HH 




Sbjct: 


61 


YKAVFEPELAEKKRGQVDHSTAGGRLLYQM LHDRENSFHEKL - LAE WGNAI I SHH 


115 


Query: 


114 


GMYDLCYYFDDAEYYGFNKFKNRINRDLDGYHYHEDIKGYALKLEKKLCDYGYK-DLREL 


172 






+Y N + R L+ +++ Y +E+ + + +L 




Sb j ct : 


116 


SNLQ DYISPTIESNFLTRVLE KELPEYESAVERFFQEVMTEAELARY 


162 


Query: 


173 


IDKAFDNYQQAMSSLNWQDKSEWDYYQSCMVRLYLSLLKNADILDTVNAYGLKISPMDKT 


232 






+ KA D +Q + Q Y SC++ +AD +T + + + T 




Sb j ct : 


163 


VAKAVDEIKQFTDNSPTQSFFLTKYIFSCLI DADRTNT - RMFDEQAREEEPT 


213 


Query: 


233 


ERSFLKHSYLAAIEQKYASFGQPNNQ LNTIRTEIAERVKERGKRDSKGIYRLDLPTG 


289 






+ L Y + AS + ++ +N +R+ ++E+ + R S GIY L +PTG 




Sbj ct : 


214 


QPQQLFEHYHQQLENHLASLKESDSAQKPINVLRSAMSEQCESFAMRPS-GIYTLSIPTG 


272 


Query: 


290 


AGKTNLSMRYAFHQLVHHDKSRFFYITPFLSVLEQNASEIRKVTGD-LGVLEHHSNWKQ 


348 






GKT S+RYA ++K R YI PF +++EQNA E+R + GD +LEHHSNW+ 




Sbj ct : 


273 


GGKTnASLRYALKHAQEYNKQRIIYIVPFTTIIEQNAQEVRNILGDDENILEHHSNWED 


332 


Query: 


349 


ANEDDDDKDSLLSA YLSDSWDSQWLTSMVQFFQTLFKTKSANLRRFSSLINSW 


403 






+ D+ +D +++ D+WD ++ T++VQF + + N RR +L +SV+ 




Sbj ct : 


333 


SENGDEQEDGVITKKERLRLARDNWDRPIIFTTLVQFLNVFYAKGNRNTRRLHNLSHSVL 


392 


Query: 


404 


ILDEVQSLPIEVTTLFNLTMNFI^KViyDTTIVLCTATQPAYDSSEIDHRICYGGNLGErA 


463 






I DEVQ +P + +LFN +NFL + +I+LCTATQP ++ + H + + 




Sbj ct : 


393 


IFDEVQKVPTKCVSLFNEALNFLKEFAHCSILLCTATQPTLEN- -VKHSLLKDRD G 


446 


Query: 


464 


EIVELTIEEKQIFSRTELRKFDDSDQKVHLTDVINLILGEE NS VLAI FNTKKTVHNC 


520 






EIV+ E + F R E+ D +DQ + + + E S L I NTKK V + 




Sbj ct : 


447 


EIVQNLTEVSFAFIOlvEI- -LDKTDQPMTNERLAEWVRDEAPSWGSTLIILNTKKVVKDL 


504 


Query: 


521 


YTMLKDMTDRPVYQLSTNMCAQHRLDLIAKIKTELQNNI P I I CI STQLIEAGVD VDFHRV 


580 






Y L+ PV+ LST+MCA HR D + +1+ L+ P IC++TQLIEAGVDV F V 




Sbj ct : 


505 


YEKLEG-GPLPVFHLSTSMCAAHRKDQLDEIRALLKEGTPFICVTTQLIEAGVDVSFKCV 


563 


Query: 


581 


IRSYSGIDSIVQAAGRCNREGKRDKGQVTLVNLTNEEENISRLTEIKTKKEATESILHKI 


640 
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IRS +G+DSI QAAGRCNR G+ V +++ + EE +S+L EI+ +E ++L + 

Sbjct: 564 IRSLAGLDSIAQAAGRCNRHGEEQLQYVYVID--HAEETLSKLKEIEVGQEIAGNVLARF 621 

Query: 641 GSPIDISTLN RDFFEYYYANNQGLMDYPLED NLSIYDYLSLNIYQTAN 688 

+ N R++F YYY+ ++Y +++ + + N Y T 

Sbjct: 622 KKKAEKYEGNLLSQAAMREYFRYYYSKNTOANLNYF\^TO 681 

Query: 689 KKFKGK LKQAFKTAGAKMNLINNDMIGILVPYGEAEKKLAYLEELGVSHFLSAKD 743 

+K G L ++KTA +1+ + +VPYGE + +A L S + 

Sbjct: 682 QKNTGTHFPLLLNGSYKTAADHFRVIDQNTTSAIVPYGEGQDIIAQLN SGEW 733 

Query: 744 YQTIKSLLKELQPFTVNV- -RENDPLFE- -TTKBYLNGQILVLTSEYYDTERGVKY 795 

+ +LK+ Q +TVN+ +E D L + +L+G + h +Y + GV + 

Sbjct: 734 VDDLSKVLKKAQQYTVNLYSQEIDQLKKEGAIVMHLDGMVYELKESWYSHQYGVDF 789 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2816 

A DNA sequence (GASxl558) was identified in S.pyogenes <SEQ ID 8131> which encodes the amino acid 
sequence <SEQ ID 8132>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1050 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2817 

A DNA sequence (GASxl563) was identified in S.pyogenes <SEQ ID 8133> which encodes the amino acid 
sequence <SEQ ID 8134>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1872 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2818 

A DNA sequence (GASxl564R) was identified in S.pyogenes <SEQ ID 8135> which encodes the amino 
acid sequence <SEQ ID 8136>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

5 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2173 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2819 

A DNA sequence (GASxl566R) was identified in S.pyogenes <SEQ ID 8137> which encodes the amino 
acid sequence <SEQ ID 8138>. Analysis of this protein sequence reveals the following: 

20 Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 3486 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
30 antigens for vaccines or diagnostics. 

Example 2820 

A DNA sequence (GASxl568) was identified in S.pyogenes <SEQ ID 8139> which encodes the amino acid 
sequence <SEQ ID 8140>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

35 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2711 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

45 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 



WO 02/34771 



-2823- 



PCT/GB01/04789 



Example 2821 

A DNA sequence (GASxl569) was identified in S.pyogenes <SEQ ID 8141> which encodes the amino acid 
sequence <SEQ ID 8142>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

5 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2822 

A DNA sequence (GASxl576R) was identified in S.pyogenes <SEQ ID 8143> which encodes the amino 
acid sequence <SEQ ID 8144>. Analysis of this protein sequence reveals the following: 

20 Possible site: 28 

»> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm — Certainty=0. 4042 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

30 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2823 

A DNA sequence (GASxl577R) was identified in S.pyogenes <SEQ ID 8145> which encodes the amino 
35 acid sequence <SEQ ID 8146>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0. 3342 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04515 GB:AP001509 unknown [Bacillus halodurans] 
Identities = 36/104 (34%) , Positives = 55/104 (52%) 
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Query: 2 HMGAWNTGmKILYTQESVTDDMIAKRDQSIKDaKESPILGFTVDTKVIKTELSNISNVM 61 

+M ++ GN IL E D + + A SP LGF D+ ++TE++ ISNV 

Sbjct: 392 NMPSFAIGNQLILKLYEDDPQDKWEAFEAFNESAIPSPALGFYFDSNPWTEIAAISNVT 451 

5 

Query: 62 NRYKASINTGTVDPDEALPKLLADLKGAGWDKVQKEVQKQLDDF 105 

+ + ++ G VDP+E LP L AG KV E+Q+Q D++ 

Sbjct: 452 SEFSPALLKGAVDPEEYLPLFNDKLNEAGLQKVIDEMQRQFDEW 495 

10 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2824 

A DNA sequence (GASxl578R) was identified in S.pyogenes <SEQ ID 8147> which encodes the amino 
acid sequence <SEQ ID 8148>. Analysis of this protein sequence reveals the following: 

15 Possible site: 27 

>>> May be a lipoprotein 

Final Results 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 
25 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04515 GB:AP001509 unknown [Bacillus halodurans] 
Identities = 134/346 (38%) , Positives = 206/346 (58%) , Gaps = 10/346 (2%) 

Query: 21 AACESKSASKDSDVKLLMYQVGDKPDNFDELMTIANKR1KEKTGATVDLQYIGWGDWDDK 80 
30 +A E+++ DVLY+G + + +M N +EK ATVDL+ + WG++D++ 

Sbjct: 42 SANETEATDLDH-VTLTWYMIGTPQPDLELVMEEVNAYTEEKINATVDLRMLDWGEYDER 100 

Query: 81 MSTI IASGENYDIAF ANNYWNAQKGAFADLTTLMPKYAKKTYKNLDPAYI KGNTI 136 

M I SGE YDIAF ANNY +NA++GAF +L L+ ++ ++ + +DPA+++G + 
35 Sbjct: 101 MQVITTSGEAYDIAFTSSWANNYALNARRGAFLELNDLLDEHGQEMKELIDPAFLEGAQV 160 

Query: 137 DGKLYAFPVDANVYAQQMLSFNKELVDKYGLDISNIKSYADAENVLKQFHEKEPNTAAFA 196 

DGKLYA P + V Q +LSFN ELV+K+ LD+S++ S AD E +L E+E + A 
Sbjct: 161 DGKLYAVPTNKEVGQQAVLSFNNELVEKHNLDLSSVHSLADLEPLLAVIKEEESDVTPIA 220 

40 

Query: 197 IGQVFSMSGDYDYPLTKTQPFAVKIDEGKPTI INQYEDESFKNNLRLMHKWYKEGLI PTD 256 

F +D L + PFA +++ +IN+YE++ L+ MH +YK+G I D 

Sbjct: 221 TFDAYLPFDSILQEEMPFAFRLEGNTNEVINKYEEDITMETLKTMHDYYKKGYIRPD 277 

45 Query: 257 AATNTEGYPLEGNTWFMREETQGPMDYGDTILTNAAGKDIVSRPLTKPLKTTSQAQMANF 316 

AAT+T+ +PLE WF+R+E P Y + I T AG +1 +RPL +P + + 
Sbjct: 278 AATSTDSWPLETPNWFVRKELYQP--YAELIWTRTAGYEIATRPLHEPYIFNNSVTGSMQ 335 

Query: 317 WSSVSKNKEKAVEVLSLLNSDPELLNGLVYGVEGKAWEKIGDKKI 362 
50 +S+ SKN E+A+ L+LLNSDP L N L G+EG +E++ D I 

Sbjct: 336 AISATSKNPERAMMFLNLLNSDPYLRNLLDKGIEGVHYEELEDGTI 381 



Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2825 

A DNA sequence (GASxl582) was identified in S.pyogenes <SEQ ID 8149> which encodes the amino acid 
sequence <SEQ ID 8150>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

5 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0454 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2826 

A DNA sequence (GASxl584R) was identified in S.pyogenes <SEQ ID 8151> which encodes the amino 
acid sequence <SEQ ID 8152>. Analysis of this protein sequence reveals the following: 

20 Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 3105 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

RGD motif: 3-5 

30 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG21428 GB:AF307332 meningioma-expressed antigen 5s splice 
variant [Homo sapiens] 
Identities = 94/271 (34%) , Positives = 148/271 (53%) , Gaps = 14/271 (5%) 

35 

Query: 120 GIIEGFYGTPWTREERLDCLRFIGNKRMNTYMYAPKDDDYQRKLWRDLYPEDWVTYFKEL 179 

G++EGFYG PW E+R + R + +NTY+YAPKDD R WR++Y + L 
Sbjct: 63 GvvEGFYGRPWVMEQRKELFRRLQKWELNTYLYAPKDDYKHRMFWREMYSVEEAEQLMTL 122 

40 Query: 180 IAVAKEEGLDFWYMISPGLDFDYTKEADYQLLYQKLQQLLALGVCHFGLLLDDIDYQIVD 239 

++ A+E ++F Y ISPGLD ++ + L +KL Q+ G F LI DDID+ + 
Sbj ct : 123 ISAAREYEIEFIYAISPGLDITFSNPKEVSTLKRKLDQVSQFGCRSFALLFDDIDHNMCA 182 

Query: 240 AVERRFKKTAYAQAHIATEVHHFLNQQHAAPELVI CPTE YDNHHDSIYLQELSE 293 

45 A + F A+AQ + E++ +L + + CPTE Y N S YL+ + E 

Sbjct: 183 ADKEVFS S FAHAQVS ITNEI YQYLGEPET FLFCPTEYCGTFCYPNVSQS PYLRTVGE 239 

Query: 294 RIPKEVAFFWTGPSTLASQISQADIETMAAVYQRPIIIWDNIPVNDYQKDPERLFLTPFA 353 
++ + WTGP ++ +1 IE ++ + +R +IWDNI NDY D +RLFL P+ 
50 Sbjct: 240 KLLPGIEVLWTGPKWSKEIPVESIEEVSKIIKRAPVIWDNIHANDY--DQKRLFLGPYK 297 

Query: 354 NRS PFLCQPDYQVKGI VSNPMI SWELSKLTL 384 

RS L ++RG+++NP +E + + + 

Sbjct: 298 GRSTELIP RLKGVLTNPNCEFEANYVAI 325 

55 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2827 

A DNA sequence (GASxl585R) was identified in S.pyogenes <SEQ ID 8153> which encodes the amino 
5 acid sequence <SEQ ID 8154>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 .4469 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2828 

20 A DNA sequence (GASxl587) was identified in S.pyogenes <SEQ ID 8155> which encodes the amino acid 
sequence <SEQ ID 8156>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>>> Seems to have no N-terminal signal sequence 

25 

Final Results 

bacterial cytoplasm Certainty=0. 3082 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04509 GB:AP001509 unknown conserved protein in others 
[Bacillus halodurans] 

35 Identities = 221/425 (52%) , Positives = 296/425 (69%) , Gaps = 4/425 (0%) 



Query: 


12 


RPIPTSVSQFMAKVESLCGDQHPDWALNFKTSFTNTLETTLKTYEDGTSFLLTGDIPAiyiW 


71 






+ IP S+ +A+V++ D L F+ F NT TT++ E GT F++TGDIPAMW 




Sbjct: 


4 


KKIPRSLQAIIAQVKAHYADDQELQTL-FEQCFLNTYLTTIQEDEQGT-FVVTGDIPAMW 


61 


Query: 


72 


LRDSTAQMKPYLFIAKEDEEIRKI IAGLVKRQFRYI CIDPYANAFNEEANEKGHQTDHTQ 


131 






LRDS+AQ++PYL + KED ++ ++I G+++RQ+RYI DPYANAFN+ AN++GHQ D T+ 




Sbj ct: 


62 


LRDSSAQVRPYLTWKEDADMARMIKGVIERQWRYILHDPYANAFNQTANKQGHQQDRTE 


121 


Query: 


132 


MNPWIWERKYEIDCLCYPIQLAYLLYRETGSTDQFNDDFHRGVELILDLWTVEQDH-AQS 


190 






M+P +WERKYE+D LCYPIQLAYL ++ TG + +E I +W +EQDH A+S 




Sbj ct : 


122 


MSPLVTORKYELDSLCYPIQLAYLYWKATGDDSVLQPTLKQVLETIYRIWKIEQDHEAKS 


181 


Query: 


191 


PYLFERDTWRKEDTLTHAGKGSPVAPTGMTWSGFRPSDDACQYGYLIPSNMFAVVVLSYL 


250 






Y FERD R DTL GKG PTGMTWSGFRPSDDAC YGYLIP+NMFAVW +Y 




Sbj ct: 


182 


SYSFERDDCRVSDTLLRKGKGGYSVPTGMTWSGFRPSDDACLYGYLIPANMFAWVSNYA 


241 


Query: 


251 


EDLYNNLFHNEPVATRAKQLKEAIQSGIADHALVQNSKGETIYAYEVDGLGQFSIMDDAN 


310 






+L + +A ++L+ 1+ GI + + + IY YE DG G+ ++MDDAN 
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Sb j ct : 


242 


VELLTAM-EEIKLAEEFRELEADIRQGIGQYGKMDHPVYGEIYVYETDGNGRVNLMDDAN 


300 


Query: 


311 


IPSLIAAPYLGFCTKDDPIYLATRRTILSQENPYYYQGNAAAGIGSSHTPENYIWHIALA 


370 






+PSLLA PYLG+ T DDP+Y TRR ILS++NPYYY+G+ A G+GS HTP++Y+WHI+LA 




Sb j ct : 


301 


VPSLLAIPYLGYTTADDPVYQNTRRFILSRDNPYYYEGSYAKGVGSPHTPDHYVWHISLA 


360 


Query: 


371 


LQGLTALDQDSKKEMLDLLVATDAGTHLMHEGFDVNDPYQYTREWFSWANMMFCELLLDY 


430 






+QG+TA+D KK+++ + T A T+ MHEGFDV+ P QYTR WF+WAN MF E LL 




Sbj ct : 


361 


IQGMTAIDSKEKKQIVAMFKQTHADTYFMHEGFDVDRPEQYTRSWFAWANSMFSEFLLSE 


420 


Query: 


431 


LGFSI 435 
G + 




Sbj ct : 


421 


AGIYV 425 





Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2829 

A DNA sequence (GASxl588) was identified in S.pyogenes <SEQ ID 8157> which encodes the amino acid 
sequence <SEQ ID 8158>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5250 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04508 GB:AP001509 unknown conserved protein in others 
(divided) [Bacillus halodurans] 
Identities = 312/737 (42%) , Positives = 426/737 (57%) , Gaps = 21/737 (2%) 



Query: 


123 


FPDTFGNMGQTPQLMLKAGLQAAAFGRGIRPTGFNNQVDTSEKYSSQFSEISWQGPDNSR 


182 






FPDTFG GQ PQL+ +AG++AA FGRG+ PTGFNNQV + YSS FSE+ W+ PD S+ 




Sbj ct : 


4 


FPDTFGIYGQAPQLLAQAGIRAAVFGRGVTPTGFNNQVQHDD-YSSPFSELIWEAPDGSQ 


62 


Query: 


183 


I LGLLFANWYSNGNE I PTTEAEARLFWDKKLADAERFASTKHLLMMNGCDHQPVQLD VTK 


242 






++G+L ANWYSNGNEIPT E EA+ FW KKL DAERFAST LL MNGCDHQPVQ DVT+ 




Sbj ct : 


63 


VIGILLANWYSNGNEIPTDEDEAQTFWVKKLRDAERFASTSQLLFMNGCDHQPVQKDVTQ 


122 


Query: 


243 


AIAIjANQLYPDYEFvBSCFEDYI^LADDLPENLSTVQGEITSQETDGWYTLANTASARI 


302 






AI +A L+PD F HS F DYL + ++LP+ L + GE+ +Q+TDGW TL NTASARI 




Sbj ct : 


123 


AIKVAETLFPDVAFKHSNFHDYLTQIKEELPKELQKITGELRNQKTDGWSTLVNTASARI 


182 


Query: 


303 


YLKQANTRVSRQLENITEPLAAMAYEVTSTYPHDQLRYAWKTLMQNHPHDSICGCSVDSV 


362 






YLKQAN R L N+ EP+ + + D Y WK LM+NHPHDSICGCS+D+V 




Sbj ct : 


183 


YLKQANDRCQTLLTNVLEPMCLLV--F^KSLHRDFSEYYWKLLMENHPHDSICGCSIDAV 


240 


Query: 


363 


HREMTRFEKAYEVGHYLAKEARKQIADAIDTRDFP^SQPFVLFNTSGHSKTSVAELSL 


422 






HREM TRFEK E K+IA I+T ++ P V+ T+G S V + 




Sbjct: 


241 


HREMKTRFEKVEAGATTFIAEQGKEIARQINTLHDSEEAIPIiVVLKTNGTSGKRVVRHKV 


300 


Query: 


423 


TWKKYHFGQRFPKE VYQEAQEYLARLSQSFQI IDTSGQVRPEAE I LGTS IAFDYDLPKRS 


482 






KK +F + ++ + L + ++ + E+ + F YDLP+ 




Sbj ct : 


301 


AMKKIYFDEM DFRHIPDRLKEIVMPTYRLEFPNKGSVPIEVQDAGVRFGYDLPRDG 


356 


Query: 


483 


FREPYFAI KVRLRLPI TLPAMSWKTLALKLG NETTPSETVSLYDDSNQCLENGF 


536 






FR PY+A L+T S L+G +T+ +DS LEN 




Sbj ct : 


357 


FRRPYYA RELEVTFSYDSDLYLGYECGFLVPVEEKQTEARKELIGDPSMNTLENEA 


412 
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Query: 


537 


LKVMIQTDGRLTITDKQSGLIYQDLLRFEDCGDIGNEYISRQPN1JDQPFYADQGTIKLNI 


596 






+KVMI +G +1 DK +G Y+ L +ED GDIGNEY+ + + + + +1 




Sb j Ct : 


413 


MKVMIHRNGSYSILDKTTGFEYRHLGIYEDVGDIGNEYMFKASSDGVRYTTEACFASIRI 


472 


Query: 


597 


I SNTAQVAELE I QQTFAI P I SADKLLQAEMEAVIDITERQARRSQEKAELTLTTLIRMEK 656 






I N + A +EI QT ++P +AD+ L+ E E ++ +R+A RS+E+ ++TL T + +E+ 




Sb j ct : 


473 


IENNSLCATVEICQTLSVPAAADERLKEEQERLVWHPDRKAGRSKERTDITLRTELTLEQ 


532 


Query: 


657 


1OTPRLQFTTRFDNQMTNHRLRVLFPTHLKTDHHLADSIFETVKRPNHPDATFWKNPSNPQ 


716 






L+ DN +HR+R LFP +H ADSI+E V+RPN PD W+NP+ 




Sb j ct : 


533 


GAKGLKWIWNIDNTAKDHRMRALFPVERARGNHYADS I YE I VERPNTPDPK- WQNPAFDH 


591 


Query: 


717 


HQECFVSLFDGENGVTIGNYGLNEYEILPDTNTIAITLLRSVGEMGDWGYFPTPEAQCLG 776 






H + VSL +GE G+TI GL+EYEI+ D +IA+TLLRSVGE+GDWG F TPEAQC G 




Sb j ct : 


592 


HMQRLVSLDNGEYGLTIATKGLHEYEIVSD- -SIAVTLLRSVGELGDWGLFETPEAQCFG 


649 


Query: 


777 


KHSLSYSFESITKQTQFAS-YWRAQEGQVPVITTQTNQHEGTIAAEYSYLTGTNDQVALT 


835 






+++ A+YA+V QT Q G L + + + + LT 




Sbjct: 


650 


QNEAQFVLLPHKGDVLSANVYVAAYDDPVEPTVIQTEQSMGPLPHATNLFQWSGEGLVLT 709 


Query: 


836 


AFKRRLADNALITRSYN 852 








A K + +1 R +N 




Sbjct: 


710 


ACKPTMDGRGMILRWFN 726 





Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2830 

A DNA sequence (GASxl589R) was identified in S.pyogenes <SEQ ID 8159> which encodes the amino 
acid sequence <SEQ ID 8160>. Analysis of this protein sequence reveals the following: 
Possible site: 31 

>» Seems to have a cleavable N-term signal seq. 



INTEGRAL 


Likelihood 


=-11. 


,30 


Transmembrane 


203 - 


219 


( 


195 - 


221) 


INTEGRAL 


Likelihood 


= -8. 


,17 


Transmembrane 


61 - 


77 


( 


59 - 


82) 


INTEGRAL 


Likelihood 


= -3. 


.98 


Transmembrane 


107 - 


123 


( 


107 - 


124) 


INTEGRAL 


Likelihood 


= -3. 


.40 


Transmembrane 


39 - 


55 


( 


38 - 


58) 


INTEGRAL 


Likelihood 


= -2. 


.34 


Transmembrane 


129 - 


145 


( 


126 - 


145) 


INTEGRAL 


Likelihood 


= -2. 


,07 


Transmembrane 


89 - 


105 


( 


87 - 


105) 



Final Results 

bacterial membrane Certainty=0 . 5522 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAC10175 GB:AJ278302 histidine kinase [Streptococcus pneumoniae] 
Identities = 114/432 (26%) , Positives = 219/432 (50%) , Gaps = 10/432 (2%) 

Query: 21 LTLKLFS FVSAI PLRLKNI FYLSLSMVLFQWFWAFFPDHFI LDWMLAQF LFFALI 77 

L + +F V I L + IF L +L WF +++ V L+ F L+ + 

Sbjct: 16 LKIVI FFKVDGI SLTFERI FKAFLFKILLAWFGML GYMVGNVYLSYFMEPLYGIGL 72 

Query: 78 ALYYGKS I KAKFLMFYAFFPLVS I SLVKRFIVFFVMPLFGMPYS WKHNTLLI YS ITCFS 137 

+ + + K L+FY FP++ ++L R + +FV+P G V + + I F+ 
Sbjct: 73 SFLLLRELPKIOjLLFYGLFPMILvlJLFYRGVSYFVLPFLGQG-QVYDDYSFIWLCIIIFN 131 

Query: 138 IFLIYRCIQVFHFDFSTWRQYFQSHRASKLLVFTNSSMALYYLCVQGIDVMSPSLSGLAT 197 

F+ ++ +DF++ R+ K L N M YYL +Q + G+ + 

Sbjct: 132 FFISLAFLKWLDYDFTSLRKGILDKDFQKSLTQINWIMGAYYLVIQNLSYFEYE-QGIQS 190 
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Query: 


198 


TTARSIIVLFYFILFLTLLIHLERYVKQNSIEAIVQQKE--YRELINYSQHLGLLYQDIQ 


255 






TT R +I++FY + F+ ++ L+ Y+K E + Q+++ YRE+ YS+H+ LY++++ 




Sb j ct : 


191 


TTVRHLILVFYLLFFMGIIfCKLDTYLKDKLHERLNQEQDLRYREMERYSRHIEELYKEVR 


250 


Query: 


256 


ELRRLLTTVSSRLKIGIEQITOISITOLTYEGIIiN^KNNAKDDRLDLTCLDKLQVEAIRH 


315 






R T + + L++GIE+ D+ ++ Y+ +L +D++ DL L ++ A++ 




Sb j ct : 


251 


SFRHDYTNLLTSLRLGIEEEDMEQIKEIYDSVLKDSSEKLQDNKYDLGRLVNVRDRALKS 


310 


Query: 


316 


IVIAKLIEAKNKKLKA/EVSIPNCIATFFLEVVDFTKLLSFIiLDNAIEMSLETKQPCLSIA 


375 






++ K I+A++K + V +P I + ++DF ++S L DNAIE S+E QP +SIA 




Sb j ct : 


311 


L1AGKFI KARDKNI VFNVEVPEE I QVEGVSLLDFLTWSILCDNAI EASVFACQPHVS I A 


370 


Query: 


376 


FLDQNHKLVIVIQSSTKQGQDDSQSVFAIPALKKRDDWQFDLRlir^ 


435 






F + +I++S K+ D +F+ A K ++ L V 1+ + ++++ 




Sbjct: 


371 


FFKNGAQETFIIENSIKEEGIDISEIFSFGASSKGEERGVGLYTVMKIVESHPNTSMTT 


430 


Query: 


436 


IHDGILTQLIEI 447 








D + Q++ + 




Sb j ct : 


431 


CQDHVFRQVLTV 442 




Based on 


this 


analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 



antigens for vaccines or diagnostics. 



Example 2831 

25 A DNA sequence (GASxl593R) was identified in S.pyogenes <SEQ ID 8161> which encodes the amino 
acid sequence <SEQ ID 8162>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have an uncleavable N-term signal seg 
30 INTEGRAL Likelihood = -1.28 Transmembrane 2 - 18 ( 1-18) 

Final Results 

bacterial membrane Certainty=0 . 1510 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
40 antigens for vaccines or diagnostics. 

Example 2832 

A DNA sequence (GASxl594) was identified in S.pyogenes <SEQ ID 8163> which encodes the amino acid 
sequence <SEQ ID 8164>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

45 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.93 Transmembrane 76 - 92 ( 76 - 92) 

Final Results 

50 bacterial membrane Certainty=0 .2572 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF61313 GB:U96166 unknown [Streptococcus cristatus] 
Identities = 31/66 (46%) , Positives = 40/66 (59%) , Gaps = 2/66 (3%) 

5 Query: 14 LLGRILSKYVGRLTSCIENETTKIRmSRQMJTIGLtQHLLGNLKTvHNPEIILKTINVYS 73 

+ G +SK + + E K+ ++ ND IG N LLG+LKTVHNPEI I + VYS 
Sbjct: 30 VFGMDVSKTSSEVAIL VNGE - - KVHGYT I LNDAI GFNRLLGDLKTVHNPE 1 1 FEATG VYS 87 

Query: 74 RRLQVF 79 
10 RRLQ F 

Sbjct: 88 RRLQAF 93 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

15 Example 2833 

A DNA sequence (GASxl598) was identified in S. pyogenes <SEQ ID 8165> which encodes the amino acid 
sequence <SEQ ID 8166>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

20 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2117 (Affirmative) < suco 

bacterial membrane Certainty-0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
30 antigens for vaccines or diagnostics. 

Example 2834 

A DNA sequence (GASxl608) was identified in S.pyogenes <SEQ ID 8167> which encodes the amino acid 
sequence <SEQ ID 8168>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

35 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

45 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2835 

A DNA sequence (GASxl619) was identified in S.pyogenes <SEQ ID 8169> which encodes the amino acid 
sequence <SEQ ID 8170>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

5 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2916 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2836 

A DNA sequence (GASxl621) was identified in S.pyogenes <SEQ ID 8171> which encodes the amino acid 
sequence <SEQ ID 8172>. Analysis of this protein sequence reveals the following: 

20 Possible site: 33 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 1899 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
30 The protein has homology with the following sequences in the GENPEPT database: 

alpha subunit [Escherichia coli] 
Identities = 110/211 (52%) , Positives = 153/211 (72%) 

Query: 7 KEITIKEffiVAHVKDGDTIMVGGFMTNGTPEKLIDALvEKGVKDLTLICNDAGFPDKGVGK 66 
35 K +T+++A +DG TIMVGGFM GTP +L++AL+E GV+DLTLI ND F D G+G 

Sbjct: 4 KLMTLQDATGFFRDGMTIMVGGFMGIGTPSRLVEALLESGVRDLTLIANDTAFVDTGIGP 63 

Query: 67 MVANKQFSTIIASHIGLNREAGRQMTEGETVIDDVPQGTLAERIRSGGFGLGGFLTPTGI 126 
++ N + +IASHIG N E GR+M GE + LVPQGTL E+IR GG GLGGFLTPTG+ 
40 Sbjct: 64 LIVNGRVRKVIASHIGTNPETGRRMISGEMDWLVPQGTLIEQIRCGGAGLGGFLTPTGV 123 

Query: 127 GTEVAKGKEVITIDGKr)YLLEKPLKADV7ALIFANKADKNGNLQYAGSFJ^FNHVMAANAK 186 

GT V +GK+ +T+DGK +LLE+PL+AD+ALI A++ D GNL Y S NFN ++A A 
Sbjct: 124 GTVVEEGKQTLTLDGKTWLLERPLRADLALIRAHRCDTLGNLTYQLSARNFNPLIALAAD 183 



45 



Query: 187 TTIVEAREIVDVGQMDPNFVHTPGIFVNYLV 217 

T+VE E+V+ G++ P+ + TPG +++++ 
Sbjct: 184 ITLVEPDELVETGELQPDHIVTPGAVIDHII 214 



50 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2837 

A DNA sequence (GASxl622) was identified in S.pyogenes <SEQ ID 8173> which encodes the amino acid 
sequence <SEQ ID 8174>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

5 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4668 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:AAD54948 GB:AF157306 acetoacetate :butyrate/acetate coenzyme A 

transferase [Clostridium beijerinckii] 
Identities = 121/214 (56%) , Positives = 161/214 (74%) , Gaps = 5/214 (2%) 

Query: 7 VLSKEEIQTRIAKRVAQELEHNTLVNLGIGLPTKVANYIPEGVTITLQSENGFVGLTGLT 66 
20 VL+KE I AKRVA+EL+ LVNLGIGLPT VANY+P+ + IT +SENG VG+ + 

Sbjct: 6 VLAKEII AKRVAKELKKGQLVNLGIGLPTLVANYVPKEMNITFESENGMVGMAQMA 61 

Query: 67 DD - HYDPT I VNAGGQPVS I APGGAFFDSSTS FGI I RGGHVAATVLGALQVDKEAS I ANYL 125 
DP I+NAGG+ V++ P GAFFDSSTSF +IRGGHV VLGAL+VD+E ++AN++ 
25 Sbjct: 62 SSGENDPDIINAGGEYOTLLPQGAFFDSSTSFALIRGGHVDVAVLGALEVDEEGNLANWI 121 

Query: 126 IPGKWPGMGGAMDLLVGAKKVIVAMEHTNKGKAKIIoDKCTLPLTAQNVVNLIITEMGVF 185 

+P K+VPGMGGAMDL +GAKK+ IVAM+HT KGK KI+ KCTLPLTA+ V+LI+TE+ V 
Sbjct: 122 VPNKIVPGMGGAmLAIGAKKIIVAMQHTGKGKPKIVKKCTLPLTAKAQVDLIWELCTI 181 



30 



Query: 186 EYQDEGLCALEINPDYTFEDVQNVTEVTLIDKTN 219 

+ ++GL EI+ D T ++++ +T+ LI N 
Sbjct: 182 DVTNDGLLFRE IHKDTT I DE I KFLTDADLI IPDN 215 



35 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2838 

A DNA sequence (GASxl628R) was identified in S.pyogenes <SEQ ID 8175> which encodes the amino 
acid sequence <SEQ ID 8176>. Analysis of this protein sequence reveals the following: 

40 Possible site: 17 

>» Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0 . 1243 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

50 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2839 

A DNA sequence (GASxl639R) was identified in S.pyogenes <SEQ ID 8177> which encodes the amino 
acid sequence <SEQ ID 8178>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

5 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.65 Transmembrane 55 - 71 ( 44 - 73) 
INTEGRAL Likelihood = -7.64 Transmembrane 13- 29 ( 5- 31) 

10 Final Results 

bacterial membrane Certainty=0 .4461 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



15 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2840 

20 A DNA sequence (GASxl643) was identified in S.pyogenes <SEQ ID 8179> which encodes the amino acid 
sequence <SEQ ID 8180>. Analysis of this protein sequence reveals the following: 
Possible site: 35 

>>> Seems to have no N-terminal signal sequence 

25 

Final Results 

bacterial cytoplasm Certainty=0 . 0766 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

35 Example 2841 

A DNA sequence (GASxl645R) was identified in S.pyogenes <SEQ ID 8181> which encodes the amino 
acid sequence <SEQ ID 8182>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

40 >>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2842 

A DNA sequence (GASxl649R) was identified in S.pyogenes <SEQ ID 8183> which encodes the amino 
5 acid sequence <SEQ ID 8184>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 0931 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

1 5 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2843 

20 A DNA sequence (GASxl650) was identified in S.pyogenes <SEQ ID 8185> which encodes the amino acid 
sequence <SEQ ID 8186>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

25 

Final Results 

bacterial cytoplasm Certainty=0 . 5678 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

35 Example 2844 

A DNA sequence (GASxl651R) was identified in S.pyogenes <SEQ ID 8187> which encodes the amino 
acid sequence <SEQ ID 8188>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2761 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
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The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2845 

5 A DNA sequence (GASxl667R) was identified in S.pyogenes <SEQ ID 8189> which encodes the amino 
acid sequence <SEQ ID 8190>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>» Seems to have no N- terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0 . 2967 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

20 Example 2846 

A DNA sequence (GASxl672) was identified in S.pyogenes <SEQ ID 8191> which encodes the amino acid 
sequence <SEQ ID 8192>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

25 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -3.82 Transmembrane 3 - 19 ( 1-20) 

Final Results 

bacterial membrane Certainty=0 .2529 (Affirmative) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2847 

A DNA sequence (GASxl673R) was identified in S.pyogenes <SEQ ID 8193> which encodes the amino 
acid sequence <SEQ ID 8194>. Analysis of this protein sequence reveals the following: 

40 Possible site: 38 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -8.86 Transmembrane 51 - 67 ( 47 - 75) 

INTEGRAL Likelihood = -5.20 Transmembrane 27 - 43 ( 24 - 45) 

45 INTEGRAL Likelihood = -3.66 Transmembrane 112 - 128 ( 112 - 131) 

Final Results 
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bacterial membrane Certainty=0. 4545 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF41294 GB:AE002440 conserved hypothetical protein [Neisseria 
meningitidis MC58] 
Identities = 61/148 (41%) , Positives = 96/148 (64%) 

Query: 1 LKKSITNEKAILAQGGQEFGAQNTKFLTLLHIMIYOTAVIFALLKQIKFDGISFLGLLLM 60 

L SI +EKA++A+G +++G N+ L +H + Y+ + L F+GIS +G L + 
Sbjct: 19 LAVSIKHEKALIAKGAKQYGKTNSTLLAAvHTLYYLACFVWVWLSDTAFNGISLIGTLTV 78 

Query: 61 LLSVAVIiYEOTRILGDIWTVKLMIAKDHKYVDHWLFKTIKHPNYFtNIAPELVGIALLCH 120 

+ S +L + + LG+IWTVK+ + +H+ WLFKT +HPNYFLNI PEL+GIALLC 
Sbjct: 79 MASFVILSLIIKQLGEIWTVKIYILPNHQINRSWLFKTFRHPNYFLNIIPELIGIALLCQ 138 

Query: 121 AKITAMLLFPCYIWIYLRIREENKLLA 148 

A ++ P Y++V++ RIR+E + +A 
Sbjct: 139 AWYVLLIGLPIYLLVLFKRIRQEEQAMA 166 

A related GBS gene <SEQ ID 9009> and protein <SEQ ID 9010> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 

McG: Discrim Score: 5.86 

GvH: Signal Score (-7.5): 0.14 
Possible site: 60 

>>> Seems to have a cleavable N-term signal seg. 

ALOM program count: 2 value: -8.23 threshold: 0.0 

INTEGRAL Likelihood = -8.23 Transmembrane 69 - 85 ( 64 - 89) 
INTEGRAL Likelihood = -3.29 Transmembrane 142 - 158 ( 140 - 159) 
PERIPHERAL Likelihood = 1.70 123 
modified ALOM score: 2.15 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 4291 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

42.1/64.0% over 168aa 

imported 

EGAD| 177248 | conserved hypothetical protein {Neisseria meningitidis} Insert characterized 
GP|7379797|emb|CAB84365.l| |AL162755 putative integral membrane protein {Neisseria 
meningitidis} Insert characterized 

GP|722612l|gb|AAF41294.l| |AE002440 conserved hypothetical protein {Neisseria meningitidis 
MC58} Insert characterized 

PIR| F81147 | F81147 probable integral membrane protein NMA1102 - Neisseria meningitidis 
(group B strain MD58, group A strain Z2491) Insert 
characterized 

ORF00432(301 - 807 of 1140) 

EGAD! 177248 |NMB0883 (1 - 169 of 169) conserved hypothetical protein {Neisseria 
meningitidis}GP|7379797|emb|CAB84365.l| |AL162755 putative integral membrane protein 
{Neisseria meningitidis }GP| 7226121 |gb |AAF41294 . l| |AE002440 conserved hypothetical protein 
{Neisseria meningitidis MC58}PIR) F81147 | F81147 probable integral membrane protein NMA1102 
[imported] - Neisseria meningitidis (group B strain MD58, group A strain Z2491) 
%Match =19.0 

%Identity =42.0 %Similarity = 63.9 

Matches = 71 Mismatches = 61 Conservative Sub.s = 37 
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237 267 297 327 357 387 417 447 

SSGEYHLLTSDHSLV*IGKAXX*LIXXXEFTMSIIIGLMAAMFIIRLAYLKLSIANEKA^ 



MTMILSILSLFFIIRLLFLAVSIKHEKALIAKGAKQYGKTNSTLLAAVH 
10 20 30 40 



477 507 537 567 597 627 657 687 

1IIYFSSOTEAILTKASFNWSVIGLSLMIFSVFMLHTVTRLLGRIWTVKLMVDKNHQFVDHWLFRWKHPNYFLNIAPE 




60 70 80 90 100 110 120 



717 747 777 807 837 867 897 927 

LLGVTLLCHAJCYTALFVLPIYAFVIYLRIREENLLLKTIIIPNGIKKSRVY*E*DK**T*KSFFVILSQ*EEVFISCFFS 



LIGIALLCQAWYVLLIGLPIYLLVLFKRIRQEEQAMATLF 
140 150 160 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2848 

A DNA sequence (GASxl674R) was identified in S.pyogenes <SEQ ID 8195> which encodes the amino 
acid sequence <SEQ ID 8196>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N- terminal signal sequence 



No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2849 

A DNA sequence (GASxl677R) was identified in S.pyogenes <SEQ ID 8197> which encodes the amino 
acid sequence <SEQ ID 8198>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have no N- terminal signal sequence 
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Final Results 



bacterial cytoplasm Certainty=0. 3098 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 4545 (Affirmative) < suco 

- Certainty=0. 0000 (Not clear) < suco 

- Certainty=0. 0000 (Not clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05126 GB:AP001511 unknown conserved protein [Bacillus halodurans] 
Identities = 249/534 (46%) , Positives = 380/534 (70%) 

Query: 12 QDIAFHFFGGLGLFLFSIKYMGDGLQQAAGDKLRYYIDKYTSNPFFGILVGIAMSALIQS 71 

Q + F FFGGLG+FLF IKYMGDGLQ+ AG++LR +DK+T+NP G+L GI ++ L+Q+ 
Sbjct: 6 QTLLFMFFGGLGIFLFGIKyMGDGLQKVAGERLRDLLDKFTTNPLMGVTAGIVVTvljLQT 65 

Query: 72 SSGVTVIWGLVSAGLLNLRQAIGI VMGANIGTTITSFLIGFKLGDYALPMIFIGAACLF 131 

S+G TV+T+GLV+AG + L+QAIG++MGANIGTT+T+F+IG K+ +YALP+I +GAA +F 
Sbjct: 66 STGTTVLTIGLVNAGFMTLKQAIGVIMGANIGTTVTAFIIGIKISEYALPIIAVGAALIF 125 

Query: 132 FTSNKKLNNFGRI I FGVGGI FFSLNLMGDAMDPLKSVSAFQNYLATLGDKPFQGVFIGTA 191 

F NKK+NN G++IFG G +F+ LN MG+ ++PL+ + AF + ++ + P GV IGT 
Sbjct: 126 FIKNKKVmiGQVIFGFGTLFYGLNTMGEGLNPLRELQAFADLTVSMSEMPLLGVLIGTI 185 

Query: 192 LTMLIQSSAAIIGILQGLFSGGLLTLQGAIPILLGSNIGTCITAVIAAIGSN1AAKRVAA 251 

T +QSS+A IG+LQ L+ G + L A+P+L G NIGT ITAVLAAIG+++AAKR A 
Sbjct: 186 FTAAVQSSSASIGLLQQLYDQGAMDLFAALPvLFGDNIGTTITAVIAAIGASVAAKRAAL 245 

Query: 252 AHVLFNLIGTIIFMIILVPFTSLMLWLQSKLSLTPEMTIAFSHGSFNITNTILLIPFISL 311 

HV+FNLIGTII +II++PFT + +L +L MTIAF+HG FN++NTI+ PFI + 
Sbjct: 246 THVIFNLIGTIIVLIIIIPFTHFIAYLAEVFALNRPMTIAFAHGIFNVSNTIIQFPFIGI 305 

Query: 312 IAMIVTRLIPGEDEVVKYEALYLDRLLITQAPSIALGNAHKELVHLASYAIQAFEASYSY 371 

IA+IVT+L+PG+D ++Y+A +LD + +P+IALG A +E++ +A ++ + Y 
Sbjct: 306 LAIIVTKLVPGDDFYIEYKAKHLDPRFVGSSPAIALCKJAKQEvIiRMaEFSEKGLLEVSICf 365 

Query: 372 IMTADGKFGEKVKRYERAVDTIDEELTTYLVDISNEftLSPSENEVLAGILDSSRDLERIG 431 

+ K E ++E A++ +D ++T YL+ IS+ +LS ++++ ++D+ RD+ERIG 

Sbjct: 366 MENGQKKHftE^VQFEDAINNLDRKITEYLISISSRSLSAQDSKMHGMLMDTVRDIERIG 425 

Query: 432 DHSESLGILIEGIISKQIGFSISARQELTEMYQLTHCLTLDAIRAI VDSDTDLAQTIVTR 491 

DH E++ L + + ++ S A +Ii EM+ LTH +AI ++ D + A++++ + 
Sbjct: 426 DHIENIVELKDYQKANKVKISEKALHDLQEMFDLTHSTLTEAIMSLETGDLEAARSVIEK 485 

Query: 492 HKEIEEKERRLRKTHIKRLNCGECTAQAGINFIDIISHYTRITDHALNLAEKVL 545 

+ I++ ER+LRK HI R+N G CT AGI F+DI+S+ RI DH++N+AE V+ 
Sbjct: 486 EEHIDQMERKLRKQHIIRVNEGNCTGAAGIVFVDIVSNLERIGDHSVNIAEAVI 539 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2850 

A DNA sequence (GASxl678R) was identified in S.pyogenes <SEQ ID 8199> which encodes the amino 
acid sequence <SEQ ID 8200>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2940 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2851 

A DNA sequence (GASxl685R) was identified in S.pyogenes <SEQ ID 820 1> which encodes the amino 
5 acid sequence <SEQ ID 8202>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.11 Transmembrane 13- 29( 9 - 31) 

10 

Final Results 

bacterial membrane Certainty=0. 3845 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

20 Example 2852 

A DNA sequence (GASxl695R) was identified in S.pyogenes <SEQ ID 8203> which encodes the amino 
acid sequence <SEQ ID 8204>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

25 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1357 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
35 antigens for vaccines or diagnostics. 

Example 2853 

A DNA sequence (GASxl698) was identified in S.pyogenes <SEQ ID 8205> which encodes the amino acid 
sequence <SEQ ID 8206>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

40 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1970 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 
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The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2854 

5 A DNA sequence (GASxl713) was identified in S.pyogenes <SEQ ID 8207> which encodes the amino acid 
sequence <SEQ ID 8208>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>>> Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0 . 3092 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

20 Example 2855 

A DNA sequence (GASxl737) was identified in S.pyogenes <SEQ ID 8209> which encodes the amino acid 
sequence <SEQ ID 8210>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

25 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1878 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
35 antigens for vaccines or diagnostics. 

Example 2856 

A DNA sequence (GASxl748R) was identified in S.pyogenes <SEQ ID 821 1> which encodes the amino 
acid sequence <SEQ ID 8212>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

40 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2841 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

5 Example 2857 

A DNA sequence (GASxl750R) was identified in S.pyogenes <SEQ ID 8213> which encodes the amino 
acid sequence <SEQ ID 8214>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

10 >>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.22 Transmembrane 18 - 34 ( 18 - 34) 

Final Results 

bacterial membrane Certainty=0 . 1489 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2858 

A DNA sequence (GASxl754) was identified in S.pyogenes <SEQ ID 821 5> which encodes the amino acid 
sequence <SEQ ID 8216>. Analysis of this protein sequence reveals the following: 

25 Possible site: 44 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

35 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2859 

A DNA sequence (GASxl759) was identified in S.pyogenes <SEQ ID 8217> which encodes the amino acid 
40 sequence <SEQ ID 8218>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>>> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0 . 1534 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2860 

A DNA sequence (GASxl764R) was identified in S.pyogenes <SEQ ID 8219> which encodes the amino 
acid sequence <SEQ ID 8220>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -6.74 Transmembrane 90 - 106 ( 87 - 121) 

INTEGRAL Likelihood = -4.57 Transmembrane 210 - 226 ( 205 - 229) 

INTEGRAL Likelihood = -4.19 Transmembrane 43 - 59 ( 42 - 62) 

INTEGRAL Likelihood = -3.77 Transmembrane 137 - 153 ( 137 - 155) 



Final Results 

bacterial membrane Certainty=0 . 3697 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2861 

A DNA sequence (GASxl768R) was identified in S.pyogenes <SEQ ID 822 1> which encodes the amino 
acid sequence <SEQ ID 8222>. Analysis of this protein sequence reveals the following: 

Possible site: 17 
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Final Results 

bacterial membrane Certainty=0 . 5946 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB84959 GB:AE000829 conserved protein [Methanobacterium 
thermoautotrophicum] 
Identities = 54/192 (28%) , Positives = 90/192 (46%) , Gaps = 6/192 (3%) 
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Query: 


7 


TKLLLLVIiANACFFFRVDGFLEFIIVIFLIiLLSALNKKKLA- - FKLAWYLLMIGLSVI 


64 






+KL ++V A F D L 1+ + L++ + A F ++ ++ L++I 




Sb j ct : 


32 


SKLTVWSATLLSTFISDLTLLIIMGVIFTALIAHSGSLRFAAPFLSFIILFWLVSLAII 


91 


Query: 


65 


PLSIFPSYLDHLLSFVSIAGRLWPSLIiAGLITIKTTTIYELVHGLRKWRFPEWLLTLA 


124 






+ S H + F+S+ F AGL TT +L LR R P + TL 
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MVL- - -SGNPHTMGFLSLFFARFFIISAAGLSFAFTTEPQKLAESLRSVRIPGEI VFTLT 
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V R+IP + E I SLK+R L+ SI+ RP L++P+++ ++ S E+ I 




Sb j ct : 
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VALRYIPALAVEASSIWDSLKLR-TSLSGSSIIRRPSLLYRGLIIPMIIRTVKISDEVAI 


207 
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185 


ASLTKGLAVNKG 196 








A+ T+G +G 




Sb j ct : 


208 


AAETRGFNPREG 219 





Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2862 

A DNA sequence (GASxl769R) was identified in S.pyogenes <SEQ ID 8223> which encodes the amino 
acid sequence <SEQ ID 8224>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial membrane Certainty=0 . 3930 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2863 

A DNA sequence (GASxl776R) was identified in S.pyogenes <SEQ ID 8225> which encodes the amino 
acid sequence <SEQ ID 8226>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood = -6.37 Transmembrane 4 - 20 ( 1 - 22) 
INTEGRAL Likelihood = -0.43 Transmembrane 261 - 277 ( 261 - 278) 

Final Results 

bacterial membrane Certainty=0. 3548 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 
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The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2864 

5 A DNA sequence (GASxl777R) was identified in S.pyogenes <SEQ ID 8227> which encodes the amino 
acid sequence <SEQ ID 8228>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>>> Seems to have no N- terminal signal sequence 
10 INTEGRAL Likelihood = -8.17 Transmembrane 1217 -1233 (1215 -1235) 

Final Results 

bacterial membrane Certainty=0 .4270 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF53254 GB:AE003639 CG16974 gene product [Drosophila 
20 melanogaster] 

Identities = 84/238 (35%) , Positives = 133/238 (55%) , Gaps = 10/238 (4%) 

Query: 516 LRLDHYELTDISLL- - KHAKNITEIJJIjDGNQITEIPKELFSQMKQLRFIJvILRSNHLTYIjiD 573 
L + L++ SLL ++ K + ELHLD +++T +P+ ++ +LR LNL N LT L 
25 Sbjct: 232 LEMSGNJILSNCSLIJflLQYMKQLQEItfliDRSE 291 

Query: 574 KDTFKSNAQLRELYLSSNFIHSLEGGLFQSLHHLEQLDLSKNRIGRLCDNPFEGLSRLTS 633 

+D F +L LYLS N + L LFQ+ L+ LDLS NR+ DN F +L 
Sbjct: 292 RDIFVGALKLERLYLSGNRLSVLPFMLFQTAADLQvLDLSDNRLLSFPDNFFARNGQLRQ 351 

30 

Query: 634 LGFAENSLEEIPEKALEPLTSLNFIDLSQNNIALLP-KTIEKLRALSTIVASRNHITRID 692 

L N L+ I + +L L L +DLSQN+L+++ K E L L + S N++T + 
Sbjct: 352 LHLQRNQLKSIGKHSLYSLRELRQLDLSQNSLSVIDRKAFESLDHLIAIOTSGNNLTLLS 411 

35 Query: 693 NI S FKNLPKLSVLDLSTNEI SNLPNGI FKQNNQL TKLDFFNNLLTQVEESV 743 

+1 F++L L LDLS N+ LP+G+F++ L T ++ F+N +++ +ES+ 

Sbjct: 412 SIIFQSLHALRQLDLSRNQFKQLPSGLFQRQRSLVLLRIDETPIEQFSNWISRYDESL 469 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
40 antigens for vaccines or diagnostics. 

Example 2865 

A DNA sequence (GASxl778R) was identified in S.pyogenes <SEQ ID 8229> which encodes the amino 
acid sequence <SEQ ID 8230>. Analysis of this protein sequence reveals the following: 

Possible site: 39 



45 



>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1067 (Affirmative) < suco 

50 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 
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The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2866 

5 A DNA sequence (GASxl779) was identified in S.pyogenes <SEQ ID 8231> which encodes the amino acid 
sequence <SEQ ID 8232>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0 . 1885 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

20 Example 2867 

A DNA sequence (GASxl786R) was identified in S.pyogenes <SEQ ID 8233> which encodes the amino 
acid sequence <SEQ ID 8234>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

25 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0612 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
35 antigens for vaccines or diagnostics. 

Example 2868 

A DNA sequence (GASxl790) was identified in S.pyogenes <SEQ ID 8235> which encodes the amino acid 
sequence <SEQ ID 8236>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

40 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

5 Example 2869 

A DNA sequence (GASxl791R) was identified in S.pyogenes <SEQ ID 8237> which encodes the amino 
acid sequence <SEQ ID 823 8>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

10 >>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.90 Transmembrane 28 - 44 ( 28- 44) 

Final Results 

bacterial membrane Certainty=0 . 1362 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9155> which encodes the amino acid sequence 
<SEQ ID 9156>. Analysis of this protein sequence reveals the following: 

20 Possible site: 25 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty= 0.300 (Affirmative) < suco 

25 bacterial membrane Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

30 >GP:AAA24923 GB:L06331 endoglycosidase [Chryseobacterium 

meningosepticum] 

Identities = 105/322 (32%), Positives = 153/322 (46%), Gaps = 53/322 (16%) 

Query: 106 ADKQAQELAKMKI PEKI PMKPLHGSLYGGYFRTWHDKTSDPTEKDKVNSMGELPKE VDLA 165 
35 AK++ + +IK + GY+RTW D T + SM LP +D+ 

Sbjct: 37 AQKSGVTVSAVNLSNLIAYKNSDHQISAGYYRTWRDSA TASGNLPSMRWLPDSLDMV 93 

Query: 166 FI FHDWTKDYSLFWKELATKHVPKLNKQGTRVIRTI PWRFLAGGDNSGIAEDTSKYPNTP 225 
+F D+T + +W L T +VP L+K+GT+VI T+ G NS T+ 
40 Sbjct: 94 MVFPDYTPPENAYWNTLKTNYVPYLHKRGTKVIITL GDLNSA TTTGGQDS 143 

Query: 226 EGNKALAKAI vDEYVYKYNLDGLDVDVEHDSIPKVDKKEDTAGVERSIQVFEEIGKLIGP 285 

G + AK I D++V +YNLDG+D+D+E A+++ ++KGP 

Sbjct: 144 IGYSSWAKGIYDKWVGEYNLDGIDIDIE SSPSGATLTKFVAATKALSKXFGP 195 

45 

Query: 286 KGVDKSRLFIMDSTYMADKNP--LIERGAPYINLLLVQVYGSQGEKGGWEPVSNRPEKTM 343 

K + F+ D+ ++NP + AP N + +Q YG R + 

Sbjct: 196 KS-GTGKTFVYDT NQNPTNFFIQTAPRYNYVFLQAYG RSTINL 237 

50 Query: 344 EERWQGYSKYIRPEQYMIGFSFYEENAQEGNLWYDINSRKDEDKANGINTDITGTRAERY 403 

Y+ YI +Q++ GFSFYEEN GN W D+ + NG TG RA Y 

Sbjct: 238 TTVSGLYAPYI SMKQFLPGFS FYEENGYPGNYKND VRYPQ NG TG-RAYDY 286 

Query: 404 ARWQPKTGG VKGGI FSYAIDRD 425 
55 ARWQP T G KGG+FSYAI+RD 

Sbjct: 287 ARWQPAT - GKKGGVFSYAIERD 307 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2870 

A DNA sequence (GASxl803) was identified in S.pyogenes <SEQ ID 823 9> which encodes the amino acid 
5 sequence <SEQ ID 8240>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>» Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 2099 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2871 

20 A DNA sequence (GASxl806R) was identified in S.pyogenes <SEQ ID 8241> which encodes the amino 
acid sequence <SEQ ID 8242>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

>>> Seems to have no N-terminal signal sequence 

25 

Final Results 

bacterial cytoplasm Certainty=0 . 2706 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB16126 GB:Z99124 ribosomal protein S18 [Bacillus subtilis] 
Identities = 51/77 (66%) , Positives = 63/77 (81%) 

35 

Query: 1 MAQQRRGGFKRRKKVDFIAANKIEYVDyKDTELLSRFVSERGKILPRRVTGTSAKNQRKV 60 

MA RRGG +R+KV + +N I ++DYKD +LL + FVSERGKI LPRR VTGT+AK QRK+ 
Sbjct: 3 MAGGRRGGRAKRRKVCYFTSNGI TH I DYKD VDLLKKFVSERGKI LPRRVTGTNAKYQRKL 62 

40 Query: 61 TTAI KRAR VMALMPYVN 77 

T AIKRAR MAL+PYV+ 
Sbjct: 63 TAAI KRARQMALLPYVS 79 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
45 antigens for vaccines or diagnostics. 

Example 2872 

A DNA sequence (GASxl809R) was identified in S.pyogenes <SEQ ID 8243> which encodes the amino 
acid sequence <SEQ ID 8244>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
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>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.59 Transmembrane 70 - 86 ( 66 - 92) 

INTEGRAL Likelihood = -6.42 Transmembrane 13 - 29 ( 8-33) 

INTEGRAL Likelihood = -5.68 Transmembrane 48 - 64 { 43 - 69) 



Final Results 

bacterial membrane Certainty=0. 4036 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2873 

A DNA sequence (GASxl813R) was identified in S.pyogenes <SEQ ID 8245> which encodes the amino 
acid sequence <SEQ ID 8246>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-10.51 Transmembrane 127 - 143 ( 113 - 147) 
INTEGRAL Likelihood =-10.46 Transmembrane 151 - 167 ( 149 - 167) 
INTEGRAL Likelihood = -4.41 Transmembrane 59 - 75 ( 57 - 77) 

Final Results 

bacterial membrane Certainty=0. 5203 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB98363 GB:U67490 lipoprotein B (lppB) [Methanococcus 
jannaschii] 

Identities = 43/143 (30%) , Positives = 68/143 (47%) , Gaps = 7/143 (4%) 

Query: 25 LLNVLLKIITGVMY- - ILYPSFLIFTLWQGMTFQLWLRLLIIPAVGFIALSYIRKRFDFP 82 

+ + ++ 11+ Y I S +IF + +L L + + F +L Y+ P 
Sbjct: 181 IFDAIMPIISKTAYPLIAITSLIIFIKNRKFGMKLIFALFLAFMIAF-SLKYLVNE P 236 

Query: 83 RPYEKWNIKPLIDKDTKGRSMPSRHVFSATMISMCLLRYYVYFGIVCLILSALLAICRVI 142 

RPY + L+ + S PS H A ++ LL Y GI+ L + ++A RV 
Sbjct: 237 RPYLVLDNVHLLCNEGNEPSFPSGHTTLAFTLATSLLFYSKKLGILFLSWAIIVAYSRVY 296 

Query: 143 AGIHYPKDVIVGYLIGLMLGLCL 165 

G+HYP DV+ G +IG+ G CL 
Sbjct: 297 VG VHYPLDVLAGMI IGI FCG- CL 318 

A related GBS gene <SEQ ID 901 1> and protein <SEQ ID 9012> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: 3.19 
GvH: Signal Score (-7.5): -2.18 

Possible site: 55 
»> Seems to have a cleavable N-term signal seq. 
AL0M program count: 3 value: -11.78 threshold: 0.0 
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INTEGRAL Likelihood =-11.78 Transmembrane 126 - 142 ( 112 - 147) 
INTEGRAL Likelihood =-11.30 Transmembrane 150 - 166 ( 147 - 166) 
INTEGRAL Likelihood = -4.41 Transmembrane 58 - 74 ( 56 - 76) 
PERIPHERAL Likelihood = 3.29 107 
modified ALOM score: 2.86 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 5713 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01020(472 - 792 of 1098) 

EGAD] 44548 | MJ0374 (213 - 318 of 330) conserved hypothetical protein {Methanococcus 
jannaschiij OMNl]MJ0374 conserved hypothetical protein SP ] Q57819 | Y374_METJA HYPOTHETICAL 
PROTEIN MJ0374. GP | 1591081 | gb |AAB98363 . 1 | | U67490 lipoprotein B (lppB) {Methanococcus 
jannaschii} PIR| F64346 | F64346 hypothetical protein MJ0374 - Methanococcus jannaschii 
%Match = 6.8 

%Identity =30.8 %Similarity =53.3 

Matches = 33 Mismatches = 49 Conservative Sub.s = 24 

222 252 282 312 342 372 402 432 

EGVTKYLRRNKHVKHFAYAPQNAGGSGATIVTLG* IMESYEQFYAKLSQPFRKSPQLI ILLNFLLKI VTGMMYILYPSFL 

VIAWLSGIFEMHKLLFTVGTIIGRLPRFLAVAYFGDVLGNINRLSDINIYLFYLINSHYNYIFDAIMPIISKTAYPLIAI 
130 140 150 160 170 180 190 

462 492 522 552 582 612 642 672 

IFTLWQGMTFQLWLRLLIIPAVGFIALSYIRKRLDFPRPYEKraiKPLIYKDTEGRSMPSRHVFSATMISMCLLRYYVYF 
::|: : |: :: :: |||| : |: : I II I I == II I = 

TSLI I F I KNRKFGMKL I FALFIiAFMIAFSLKYLVNEPRPYLVLDNVHLLCNEGNEPSFPSGHTTLAFTIATSLLFYSKKL 
210 220 230 240 250 260 270 

702 732 762 792 822 852 882 912 

GIVCLILSVLLAICRVIAGIHYPKDVIVGYLIGLILGLCLFI*RVRSK*FQKQLDSCTIGLSLR*NGEKRWH*K*QMLHL 

11= I =:==l II 1=111 Ih I =11= I II 
GILFLSWAIIVAYSRVYVGVHYPLDVLAGMIIGIFCG-CLTRIDIYKLIDNI 
290 300 310 320 330 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2874 

A DNA sequence (GASxl815R) was identified in S.pyogenes <SEQ ID 8247> which encodes the amino 
acid sequence <SEQ ID 8248>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0888 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 



The protein has no significant homology with any sequences in the GENPEPT database. 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2875 

A DNA sequence (GASxl825R) was identified in S.pyogenes <SEQ ID 8249> which encodes the amino 
5 acid sequence <SEQ ID 8250>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 7 - 23 ( 7-23) 

10 

Final Results 

bacterial membrane Certainty=0 . 1055 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

20 Example 2876 

A DNA sequence (GASxl832) was identified in S.pyogenes <SEQ ID 825 1> which encodes the amino acid 
sequence <SEQ ID 8252>. Analysis of this protein sequence reveals the following: 
Possible site: 26 

25 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0918 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
35 antigens for vaccines or diagnostics. 

Example 2877 

A DNA sequence (GASxl836R) was identified in S.pyogenes <SEQ ID 8253> which encodes the amino 
acid sequence <SEQ ID 8254>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

40 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4084 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
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The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2878 

5 A DNA sequence (GASxl864R) was identified in S.pyogenes <SEQ ID 8255> which encodes the amino 
acid sequence <SEQ ID 825 6>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have no N- terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0 . 5280 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC36810 GB:L12244 ribosomal protein L28 [Bacillus subtilis] 
Identities = 45/62 (72%) , Positives = 52/62 (83%) 

20 

Query: 1 MAKVCYFTGRKWSGNNRSHAMNQTKRTVKPN]^KVTILVDGKPKKVWASARALKSGKA?E 60 

MA+ C TG+KT +GNNRSHAMN +KRT NLQKV ILV+GKPKKV+ SARALKSGKVE 
Sbjct: 1 MARKOTITGKKTTAGNNRSHAMNASKRTWGAISI^ 60 

25 Query: 61 RI 62 

R+ 

Sbjct: 61 RV 62 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
30 antigens for vaccines or diagnostics. 

Example 2879 

A DNA sequence (GASxl869) was identified in S.pyogenes <SEQ ID 8257> which encodes the amino acid 
sequence <SEQ ID 8258>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

35 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1858 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

45 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2880 

A DNA sequence (GASxl881) was identified in S.pyogenes <SEQ ID 8259> which encodes the amino acid 
sequence <SEQ ID 8260>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

5 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2752 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

RGD motif 136-138 

No corresponding DNA sequence was identified in S.agalactiae. 
15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF04356 GB:AF177167 type IC restriction subunit [Streptococcus thermophilus] 
Identities = 358/1047 (34%) , Positives = 571/1047 (54%) , Gaps = 91/1047 (8%) 

Query: 7 TELELEKELIHLLETGESQVWYRKELKTEDALWDNFFKILAQNNTQYLNEEPLTASEKEQ 66 
20 +E +E + I +L E+QWTYR +LK+E+ALW NF L + N L E+PLT E +Q 

Sbjct: 4 SEQMIENQFIQILSEKENQWryRPDLKSEEALWQNFRSHLNRINLAVLGEQPLTDKEFKQ 63 

Query: 67 IKNQLNFVNY- -YFAAKWLAGENGIAKVQVQREDAKLGTIRLEWKADNVAGGTSVYEIA 124 
+K + + + + A++WL GENG+A++ ++RED K + LE + +++GGTS YE+ 
25 Sbjct: 64 VKVEFSRLTGTPFLASQWLRGENGVAQILLEREDGK--RVTLEAFRNKDISGGTSSYEW 121 

Query: 125 NQVAFSGSRDRRGDVTLLINGLPMIQIELKSQNHQ--CIEAFNQVKKYDKEGQFRGIFST 182 

+QV SR RGDV+LLINGLP+I IELK ++ + ++A+ Q+++Y ++G F+GI++T 
Sbjct: 122 HQWPDSSRVDRGDVSLLINGLPIIHIELKKESAKDGFMQAYYQIQRYAEDGFFKGIYAT 181 

30 

Query: 183 LQMFWSNKTDTRYIAAAKENKIjNP NFLTQWVDQNNKPQKDLFAFAKEVLS I PRA 237 

Q+ V+SNK DTRY A E+ FL W ++N+ DLF F + VL IP A 

Sbjct: 182 TQIMVISNKVDTRYFARPSEDTAEAYARMKKFLENWRTEDNQWSDLFDFTRTVLRIPDA 241 

35 Query: 238 HQMVMTYSVIDDDKKA LILLRPYQIHAIEAVAEASRHRKSGYIWHTTGSGKTLTSYK 294 

H+++ Y+++ DD+K L+ LRPYQIHAI + + + + G+IWH TGSGKT+TS+ 
Sbjct: 242 HELISQYTILVDDQKNQKFLMALRPYQIHAIRKIRQKAAQHEGGFIWHATGSGKTITSFV 301 

Query: 295 VARNILQIP-AVEKSIFVIDRKDLDNQTASAFQSYA QNDIFD- -VDETEDT 342 

40 + + Q V++++ V+DR DLD QT F +A +N + + + ++ 

Sbjct: 302 ATKLIAQNAIGVDRTVMVVDRTDLDAQTQDEFTKFASEYHTGQTTENSVANTLIVGIKNQ 361 

Query: 343 RQLIKNLESS--DRRVVVTTIQKLNAMISQMESYnTPKFKKLKERLAHLNWFVVDECHR 400 
+QL +NL SS + ++VTTIQKL+A + + K E+L ++VF+VDE HR 

45 Sbjct: 362 KQIAQNLLSSKNMOTILVTTIQKLSAAMRSAQQESEEKGSNQFEKLRQEHIVFIVDEAHR 421 

Query: 401 AVTPERQRYLTNTFRNSRWYGFTGTPIFVENKRAQLGDLAQTTEQQYGKCLHQYTVKEAI 460 

AV+ E + + NS W+G TGTPIF ENK+ + G A+TT QQYG LH YT+K A+ 

Sbjct: 422 AVSDEEMKRIKKILPNSTWFGLTGTPIFEENKKQENGTFARTTSQQYGPLLHSYTIKNAM 481 



50 



Query: 461 HDKAVLGFQVEYKTTIPD MPEDS 1 PEEAYDHEEHMLA VLD 500 

D AVLGFQVEY + I + +P+D+ +P E Y+ +EH+ +L 

Sbjct: 482 DDGAVLGFQVEYHSLISEEDQEVIVTQIJSIKGKLPDDALQQEKLLPTELYETDEHIRTMLQ 541 



55 Query: 501 SI INQSR- - KKLGFNNGIGQTFEGLLTvKSIARAQAYYDLMKKVKAGETDLVI SKKVKEK 558 
I N+ KK NG T +LT SIA+A+ Y ++K++K T L+ ++ E+ 

Sbjct: 542 KIFNRRSV¥KKFKVKNGF-PTMSAILTTHSIAQAKHIYRILKEMKDNGT-LLNGRQFDER 599 

Query: 559 h PDFPKVAITYSITENDNASISRQDKMTKNLEDYNHLFGTNFTIDNLQGYNRDLND 614 

60 DFP+VAIT+S + + D++ + +++Y F + D + YN+++N 

Sbjct: 600 HQLIDKDFPRVAITFSTNPDQLEKNEQDDELVEIMKEYEKQFDASPYQDE-KLYNQNINK 658 

Query: 615 RLARKKDKFKDRHEQLDLVIVVDRLLTGFDAPCLSTIFIDRQPMKPQHIIQAFSRTNRIF 674 
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Sb j ct : 


659 




675 


Sb j ct : 


718 


Query; 


733 


Sb j ct : 


777 


Query : 


793 


Sb j ct : 


832 


Query: 


845 


Sb j ct : 


892 


Query: 


905 


Sbjct: 


940 


Query: 


964 


Sbjct: 


999 



-2853- 

RLARK+ +++ + LD VIWDRLLTGFD+P + T++IDR+ M Q + +QAFSRTNRI + 
RIARKEKQYQSDGQWLDFVIVVDRIiTGFDSPTIQTLYIDRE-MNYQKLLQAFSRTNRIY 717 

ESRKHYGQVVTFQTPLRFKEAVDKALSLYSNGGEN-DV3AP-SWEEEKRRFFEKOTVLKN 732 

+ K G +V+F+ P +E V L+SN +N D L P +EE K F E T+ K 

-TGKDSGLIVSFRKPFTmENVRNTFELFSNEKQNFDQLIPKEYEEVKKEFIECSTLYKQ 776 

IVPDPDAFPTIESAQTAFLKQYAKAFQAFDBCLFASVQVYSDFNETLIiSEVGLSDEVIDTY 792 
D P A + Y K +++ L + Q DF E SEV E + Y 
10 Sbjct: 777 SEADLSDNPNDLKTMIAQVSAYQKLEKSYKALRSYDQYEEDFEE - - FSEV VEQLPQY 831 

KGTYQNVIAEIRKRRED DEAIPEINIDYELESVQMDDINYHYILTLIQAFVD 844 

+G +N+ +I++ ED ++ + EI +L + D ++ YI L++A 

QGKTENIKTKIKEMIEDEGHPEEDFEKIiLQEIAFSSQLNATHKDWDSFYINQLLKAIQL 891 

15 

QEQEALQERLMDNPmQYIQDLAKSNPAMADSLAELWQDIQKEPKAYEGKSIVYELDNLI 904 

E A+++ + +Q +K +DL ++I + +1 

NEAGAVEK- -FEKEIQQKDPQIQKMYHTLKDQLVNTTEEI DVAQLKETSI 93 9 

20 Query: 905 GDKIQRAIKHFADQWKADPDKLAFVATOYHSANSTKQVGMSTLKE-SLDYQAYKEKQGDS 963 

++IQR ++ A+++ D L Y S T L +L + ++ K G+ 

QNEIQRQLQKEAEEFGLSFDFLQSAMNEYQSDKKTIPYLTHLLDSMTLSKEEFEAKTGE- 998 

AMNKLKYKSQFERELVQFIRDQIQPLK 990 
25 K + +++ E +Q +Q+Q K 

KYRRRTKVLEERLQQNFEQLQKWK 1022 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

30 Example 2881 

A DNA sequence (GASxl882) was identified in S.pyogenes <SEQ ID 8261> which encodes the amino acid 
sequence <SEQ ID 8262>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

35 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3653 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB53491 GB:U35629 unknown [Lactococcus lactis subsp. lactis] 
45 Identities = 141/241 (58%) , Positives = 178/241 (73%) 

Query: 3 KSKQPQYRFDGFEGEWEEKELGDIVQITMGQSPSSQNYTTNPSDYILVQGNADIKNGYVF 62 

K K P+ RF GF EWE ++LGD V+I MGQSP+S+NYT +P+DYILVQGNAD+KNG V 
Sbjct: 13 KKKVPELRFKGFTDEWELRKLGDEVRIVMGQSPNSENYTDDPNDYILVQGNADMKNGRvL 72 

50 

Query: 63 PRVWTTQITKQADKGDIILSVRAPVGDVGKTNYHVIIGRGVAAIKGNEFIFQILKYLKEI 122 

PRVWTTQ+TKQA+K D+ILSVRAPVGD+GKT Y V+IGRGVAAIKGNEFIFQ L +K 
Sbjct: 73 PRVWTTQVTKCAEKDDLILSVRAPVGDIGKTAYDVVIGRGVAA.IKGNEFIFQNLGKMKSD 132 

55 Query: 123 GYWKRISTGSTFDSISSSDIKYAKIQIPSLPEQEAIGELFQMVDQLIQLQDQKLATLKEQ 182 

GYW R STGSTF+SI+S+DIK A I +P++ EQ+ IG F+ +D I L +KL LKEQ 
Sbjct: 133 GYWTRYSTGSTFES INSTDI KEAI I SVPAIEEQDKIGSFFKQLDNT I ALHQRKLDIiLKEQ 192 

Query: 183 KQTFLRKMFPAQGQKVPE IRLQGFKGEWEEKKLREVSTHRSGTAIEKYFDSEGEFKVI S IG 243 
60 K+ FL+KMFP G KVPE+R GF +WEE+KL +++ +G G++ + G 

Sbjct: 193 KKGFLQKMFPKNGAKVPELRFAGFADDWEERKLGDITKISTGKLDANAMVENGKYDFYTSG 253 
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Based on this analysis, it was predicted that this GAS-speciflc protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2882 

5 A DNA sequence (GASxl883) was identified in S.pyogenes <SEQ ID 8263> which encodes the amino acid 
sequence <SEQ ID 8264>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0 .4318 (Affirmative) < succ> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF04357 GB:AF177167 type IC modification subunit [Streptococcus thermophilus] 
Identities = 293/523 (56%) , Positives = 377/523 (72%) , Gaps = 6/523 (1%) 

20 





Query: 


6 


TSLRQALWHSADQIiRGQMDANDYKNyLLGLIFYKHLSDKLLLAVCDNLEKHFNTFTEAQK 


65 








TSL Q LW SAD LRG+MDA+ + YKNYLLGL IFYK+LSDK L V + +TF E 






Sb j ct : 


3 


TSraCXJLWASADILRGKMDASEYKNYLLGLIFYKYLSDKQLREVYEQENGKTDTFPEERST 


62 


25 


Query: 


66 


I FEDAYQDEGLKDDLISVVTGDLGYFIEPTLTFEKLIQDVYHNTFQLESLaQGFRDI 


122 








+ F + Y+++ KDDLI + GYFI+P F + F L h GF ++ 






Sb j ct : 


63 


LYAGFMEWYEED--KDDLIENIQPRQGYFIQPDRLFYHYRIKADNYEFNLTDLQAGFNEIj 


120 


30 


Query: 


123 


EQSGEDFENLFEDIDLYSKKLGSTPQKQNQTISNVMKTIiNEIDFEAVDGDTLGDAYEYLI 


182 






E+ GE+F LF DIDL S KLGS Q++N TI+ V++ L+EID +GD +GDAYEYLI 






Sb j ct : 


121 


ERQGEEFSGLFSDIDLNSTKLGSNAQQRNVTITEVLRALDEIDLFEHNGDVIGDAYEYLI 


180 




Query: 


183 


GEFASESGKKAGEFYTPQAVSHLMTQIVFLGREDQKGMTLYDPAMGSGSLLLNAKKYSNQ 


242 








G FA+ +GKKAGEFYTPQAVS +M++I +G+E + +YDPAMGSGSL+LN ++Y 




35 


Sb j ct : 


181 


GMFAAGAGKKAGEFYTPQAVSRIMSEITSIGQESRVPFHIYDPAMGSGSLMLNIRRYLIH 


240 




Query: 


243 


SDTVSYYGQEINTSTYNLARMNMMLHGVAIENQHLSNADTLDADWPTDEPINFDGVLMNP 


302 








+ V Y+GQE+NT+T+NLARMN++LHGV E +L+N DTLDADWP++EP FD V+MNP 




40 


Sb j ct : 


241 


PNQTOYHGQELNTTTFNLARMISnjlLHGvDKERM 


300 




Query: 


303 


PYSLKWSATAGFLTDPRFSSYGVLAPKSKADFAFLLHGFYHLKNTGTMAIVLPHGVLFRG 


362 








PYS KWSA FL+DPRF +G LAPKSKADFAFLLHGFYHLK +GTM IVLPHGVLFRG 






Sb j ct : 


301 


PYSAKWSAADKFLSDPRFERFGKLAPKSKADFAFLLHGFYHLKESGTMGIVLPHGVLFRG 


360 


45 


Query: 


363 


AAEGKIRQKLLEQGAIDTIIGLPSNIFYNTSIPTTIIILKKNRTNKDVFFIDASKEFDKG 


422 








AEG IRQ LLE GAID +IGLP+NIF+ TS I PTT+ I I LKKNR+ +DV FIDAS++F+K 






Sb j ct : 


361 


GAEGTIRQALLEMGAIDAVIGLPANIFFGTSIPTTVIILKKNRSRRDVLFIDASQDFEKQ 


420 




Query: 


423 


KNQNTMTDNHIKKILDAYKSRDNSDKFSYIJ^FDEIIENDYNIJSriPRYVDTFEEVPVKPL 


482 


50 






KNQN + D HI KI+ YK R++ ++++++ASFDEI END+NLNI PRYVDTFEE L 






Sb j ct : 


421 


KNQNVLLDEHIDKIVSTYKKREDIERYAHVASFDEIQENDFNLNIPRYVDTFEEEEPVDL 


480 




Query: 


483 


PELAKQLSD IDQE IAKTNAKLDQLMKQLVGTTKEAQDELDTFR 525 










E+ L I++E+ + L L+ ++E Q +++ R 




55 


Sb j ct : 


481 


VEVNTNLLKINEELVQQEQTLLSLINDF-SESEENQAMIESMR 522 





Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2883 

A DNA sequence (GASxl886R) was identified in S.pyogenes <SEQ ID 8265> which encodes the amino 
acid sequence <SEQ ID 8266>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>» Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood 




-8. 


,17 


Transmembrane 


155 - 


171 


( 147 - 


173) 


INTEGRAL 


Likelihood 




-7. 


,22 


Transmembrane 


14 - 


30 


( 11 - 


33) 


INTEGRAL 


Likelihood 




-7. 


.17 


Transmembrane 


182 - 


198 


( 179 - 


205) 


INTEGRAL 


Likelihood 




-5. 


.68 


Transmembrane 


132 - 


148 


( 128 - 


152) 


INTEGRAL 


Likelihood 




-4. 


.14 


Transmembrane 


46 - 


62 


( 43 - 


62) 


INTEGRAL 


Likelihood 




-3. 


.50 


Transmembrane 


73 - 


89 


( 73 - 


90) 


INTEGRAL 


Likelihood 




-0. 


.96 


Transmembrane 


95 - 


111 


( 95 - 


111) 



No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2884 

A DNA sequence (GASxl890R) was identified in S.pyogenes <SEQ ID 8267> which encodes the amino 
acid sequence <SEQ ID 8268>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

>>> Seems to have no N-terminal signal sequence 



RGD motif 339-341 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA62650 GB:L37110 clyM [Plasmid pADl] 
Identities = 127/492 (25%) , Positives = 230/492 (45%) , Gaps = 30/492 (6%) 

Query: 46 KLFYSEFENQLFETIMFLSMKTLVLDINHFSKEIENK SEAYEQYIQQ- IREENGIN 100 

K F L + ++ L+ KTLVLD++ F K K S+ + Y+++ + I 

Sbjct: 135 KEFIINLLENLTQELIHLTSKTLVLDLHTFKKNEPLKGNDSSKRFIYYLKKRFNSKKDII 194 

Query: 101 HFFDRYPYLLKQINKEVGLIEESYSLLFDRFLEDLSEIKSCFNI - SEPLSNVAFSLGDSH 159 

F+ YP L++ + ++ + R EDL I++CFNI S L++++ S GDSH 

Sbjct: 195 AFYTCYPELMRITVVRMRYFLDNTKQMLIRVTEDLPSIQNCFNIQSSELNSISESQGDSH 254 

Query: 160 SKKQTVVKIAFKE-KSVYYKPKSYHSHSILLELTSLLKSSNIPSFSLPKSLVKADYCWQL 218 

S+ +TV + F + K + YKPK +S+L+ L +K++Y++ 
Sbjct: 255 SRGKTVSTLTFSDGKKIVYKPK- INSENKLRDFFEFLNKELEADIYIVKKVTRNTYFYEE 313 

Query: 219 GVAYTSSNK-DEVAKIYFKYGVLAAFSEIFSITDLHMENVIVSGGDLYLIDVETFFQRKL 277 

+ N +EV K Y +YG L + +F++TDLH EN+I G +ID ETFFQ+ + 

Sbjct: 314 YIDNIEINNIEEVKKYYERYGKLIGIAFLFNVTDLHYENIIAHGEYPVIIDNETFFQQNI 373 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0 .4270 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



Final Results 



bacterial cytoplasm Certainty=0 .4757 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) <. suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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Query: 278 NVQNQNFEGITVDTYQRIYETSLSNGLFP VQFEKNSAPOTSGISRKGGKRQKGKYEL 334 

++ N TVD + ++ + GL P ++ + +S +S K Q +++ 

Sbjct: 374 PIEFGN- - SATVDAKYKYLDS IMVTGLVPYIjAMKDKSDSKDEG VNLSALNFKEQSVPFKI 431 

Query: 335 I NKNRGDLKLVKVDYFQEDRFNIPTLNGKVVEPLDYANEIISGFRECYIFLLSQRSK 391 

+ N +++ ++ NP+N+++Y I++G + + + K 

Sbjct: 432 LKIKNTFTDEMRFEYQTHIMDTAKOTPIMNNEKISFISYEKYIVTGMKSILMKAKDSKKK 491 

Query: 392 IKEIV-EGFPELKSRVPFRNTSDYGKFLQASTNPKYLFS EKKRKNLFSILYETKHT 446 

1+ LRRTYL+S+P + EK N+++ Y+ K + 

Sbjct: 492 ILAYII^LQNLIVRNVIRPTQRYADMLEFSYHPNCFSimiEREKVLHl^AYPYKNKKV 551 

Query: 447 EHFIVDNEIKDLMNGDIP-YFSMDTRGNVYNSVGTLIGNLGDTTSL FDSITILNDER 502 

H+ E DL++GDIP +++ ++ ++ S G L+ + ++L +1 L DE 
Sbjct: 552 VHY EFSDLIDGDIPIFYNNISKTSLIASDGCLVEDFYQESALNRCLNKINDLCDED 607 

Query: 503 LKFTCELLEIVL 514 

+ LEI L 

Sbjct: 608 ISIQTVWLEIAL 619 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2885 

A DNA sequence (GASxl891R) was identified in S.pyogenes <SEQ ID 8269> which encodes the amino 
acid sequence <SEQ ID 8270>. Analysis of this protein sequence reveals the following: 
Possible site: 40 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3487 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA02867 GB:L07740 salivaricin A [Streptococcus salivarius] 
Identities = 46/51 (90%) , Positives = 48/51 (93%) 

Query: 1 MSFMKNSKDILTNAIEEVSEKELMEVAGGKKGSGWFATITDDCPNSVFVCC 51 

M+ MKNSKDIL NAIEEVSEKELMEVAGGK+GSGW AT1TDDCPNSVFVCC 
Sbjct: 1 MNAMKNSKDILNNAIEEVSEKELMEVAGGKRGSGWIATITDDCPNSVFVCC 51 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2886 

A DNA sequence (GASxl901R) was identified in S.pyogenes <SEQ ID 8271> which encodes the amino 
acid sequence <SEQ ID 8272>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.59 Transmembrane 3 - 19 ( 1-20) 

Final Results 

bacterial membrane Certainty=0. 1638 (Affirmative) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

5 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2887 

A DNA sequence (GASxl905R) was identified in S.pyogenes <SEQ ID 8273> which encodes the amino 
10 acid sequence <SEQ ID 8274>. Analysis of this protein sequence reveals the following: 

Possible site: 25 



15 



20 



40 



>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.48 Transmembrane 38 - 54 ( 37 - 54) 



Final Results 

bacterial membrane Certainty=0 . 1192 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

25 Example 2888 

A DNA sequence (GASxl911R) was identified in S.pyogenes <SEQ ID 8275> which encodes the amino 
acid sequence <SEQ ID 8276>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

30 >>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-10.40 Transmembrane 27 - 43 ( 22 - 48) 

INTEGRAL Likelihood = -9.82 Transmembrane 52 - 68 ( 50 - 74) 

INTEGRAL Likelihood = -7.27 Transmembrane 113 - 129 ( 111 - 134) 

INTEGRAL Likelihood = -1.97 Transmembrane 137 - 153 ( 135 - 153) 

35 



Final Results 

bacterial membrane Certainty=0 . 5161 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2889 

A DNA sequence (GASxl915R) was identified in S.pyogenes <SEQ ID 8277> which encodes the amino 
acid sequence <SEQ ID 8278>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

5 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-10.77 Transmembrane 242 - 258 ( 238 - 262) 

Final Results 

10 bacterial membrane Certainty=0 . 5310 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

15 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2890 

A DNA sequence (GASxl918R) was identified in S.pyogenes <SEQ ID 8279> which encodes the amino 
20 acid sequence <SEQ ID 8280>. Analysis of this protein sequence reveals the following: 
Possible site: 38 



25 



30 



>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.32 Transmembrane 40 - 56 ( 39 - 60) 



Final Results 

bacterial membrane Certainty=0 . 3930 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

35 Example 2891 

A DNA sequence (GASxl923R) was identified in S.pyogenes <SEQ ID 828 1> which encodes the amino 
acid sequence <SEQ ID 8282>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

40 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.26 Transmembrane 20 - 36 ( 13 - 42) 

Final Results 

bacterial membrane Certainty=0 . 5904 (Affirmative) < suco 

45 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2892 

A DNA sequence (GASxl926) was identified in S. pyogenes <SEQ ID 8283> which encodes the amino acid 
5 sequence <SEQ ID 8284>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>» Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 2322 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2893 

20 A DNA sequence (GASxl928R) was identified in S.pyogenes <SEQ ID 8285> which encodes the amino 
acid sequence <SEQ ID 8286>. Analysis of this protein sequence reveals the following: 
Possible site: 13 



25 



30 



»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3395 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

35 Example 2894 

A DNA sequence (GASxl929R) was identified in S.pyogenes <SEQ ID 8287> which encodes the amino 
acid sequence <SEQ ID 8288>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

40 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.86 Transmembrane 17 - 33 ( 15 - 33) 

Final Results 

bacterial membrane Certainty=0. 1744 (Affirmative) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
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The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2895 

5 A DNA sequence (GASxl931R) was identified in S.pyogenes <SEQ ID 8289> which encodes the amino 
acid sequence <SEQ ID 8290>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N- terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0 . 0551 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

20 Example 2896 

A DNA sequence (GASxl941R) was identified in S.pyogenes <SEQ ID 8291> which encodes the amino 
acid sequence <SEQ ID 8292>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

25 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2377 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
35 antigens for vaccines or diagnostics. 

Example 2897 

A DNA sequence (GASxl949) was identified in S.pyogenes <SEQ ID 8293> which encodes the amino acid 
sequence <SEQ ID 8294>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

40 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0262 (Affirmative) < suco 

45 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

5 Example 2898 

A DNA sequence (GASxl951R) was identified in S.pyogenes <SEQ ID 8295> which encodes the amino 
acid sequence <SEQ ID 8296>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

10 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1330 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
20 antigens for vaccines or diagnostics. 

Example 2899 

A DNA sequence (GASxl953) was identified in S.pyogenes <SEQ ID 8297> which encodes the amino acid 
sequence <SEQ ID 8298>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

25 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
35 antigens for vaccines or diagnostics. 

Example 2900 

A DNA sequence (GASxl957) was identified in S.pyogenes <SEQ ID 8299> which encodes the amino acid 
sequence <SEQ ID 8300>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

40 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2409 (Affirmative) < suco 

45 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

5 Example 2901 

A DNA sequence (GASxl969) was identified in S.pyogenes <SEQ ID 8301> which encodes the amino acid 
sequence <SEQ ID 8302>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

10 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.28 Transmembrane 7 - 23 ( 7 - 23) 

Final Results 

bacterial membrane Certainty=0 . 1914 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
20 antigens for vaccines or diagnostics. 

Example 2902 

A DNA sequence (GASxl971R) was identified in S.pyogenes <SEQ ID 8303> which encodes the amino 
acid sequence <SEQ ID 8304>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

25 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1545 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2903 

A DNA sequence (GASxl973) was identified in S.pyogenes <SEQ ID 8305> which encodes the amino acid 
sequence <SEQ ID 8306>. Analysis of this protein sequence reveals the following: 

40 Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.44 Transmembrane 31 - 47 ( 31 - 48) 

45 Final Results 

bacterial membrane Certainty=0. 1977 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not clear) < suco 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

5 >GP:CAB51744 GB:AJ245405 speX [Streptococcus pyogenes} 

Identities = 236/256 (92%) , Positives = 243/256 (94%) 

Query: 3 M1ISFESVILKHNKIITPEKRLPMKKTKLIFSFTSIFIAIISRPVFGLEVDNNSLLRNIY 62 
MIISFESVILKHNKIITPEKRLFMKKTKLIFSFTSIFIAIISRPVFGLEVDNNSLLRNIY 
10 Sbjct: 1 MIISFESVILKHNKIITPEKRLFMKKTKLIFSFTSIFIAIISRPVFGLEVDNNSLLRNIY 60 

Query: 63 STIVYEYSDWIDFKTSHl^OTKrajDvRDARDFFINSEMDEYAANDFKDGDKIAMFSVPF 122 

STIVYEYSD VIDFKTSHNLVTKKLDVRDARDFFINSEMDEYAANDFK GDKIA+ FSVPF 
Sbjct: 61 STIVYEYSDIVIDFKTSHNLVTKKLDVRDARDFFINSEMDEYAANDFKTGDKIAVFSVPF 120 

15 

Query: 123 DWNYLSEGKVIAYTYGGMTPYQEEPMSKNIPvOT^WINRKQIPVPYNQISTNKTTVTAQEI 182 

DWNYLS+GKV AYTYGG+TPYQ+ K VNLWIN KQI VPYN+ISTNKTTVTAQEI 
Sbjct: 121 DWNYLSKGKOTAYTYGGITPYQKLQYLKISLVNLWINGKQISVPYNEISTOKTTVTAQEI 180 

20 Query: 183 DLKVRKFLISQHQLYSSGSSYKSGKLVFHTNDNSDKYSLDLFYVGYRDKESIFKVYKDNK 242 

DLKVRKFLI+QHQLYSSGSSYKSG+LVFffiTNDNSDKYS DLFYVGYRDKES I FKVYKDNK 
Sbjct: 181 DLKVRKFLIAQHQLYSSGSSYKSGRLVFHTNDNSDKYSFDLFYVGYRDKESIFKVYKDNK 240 

Query: 243 SFNIDKIGHLDIEIDS 258 
25 SFNIDKIGHLDIEIDS 

Sbjct: 241 SFNIDKIGHLDIEIDS 256 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

30 Example 2904 

A DNA sequence (GASxl974R) was identified in S.pyogenes <SEQ ID 8307> which encodes the amino 
acid sequence <SEQ ID 8308>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

35 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2022 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
45 antigens for vaccines or diagnostics. 

Example 2905 

A DNA sequence (GASxl983) was identified in S.pyogenes <SEQ ID 8309> which encodes the amino acid 
sequence <SEQ ID 8310>. Analysis of this protein sequence reveals the following: 

Possible site: 14 



50 



>>> Seems to have no N-terminal signal sequence 
Final Results 



WO 02/34771 



-2864- 



PCT/GB01/04789 



bacterial cytoplasm Certainty=0 . 0989 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2906 

A DNA sequence (GASxl987) was identified in S.pyogenes <SEQ ID 831 1> which encodes the amino acid 
sequence <SEQ ID 8312>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2389 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2907 

A DNA sequence (GASxl988) was identified in S.pyogenes <SEQ ID 8313> which encodes the amino acid 
sequence <SEQ ID 8314>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5904 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB16031 GB:AB030747 transposase [Streptococcus pyogenes] 
Identities = 22/24 (91%) , Positives = 23/24 (95%) 

Query: 1 LERLFGTAKEYHNLCYTREKGKSK 24 

+ERLFGTAKEYHNL YTREKGKSK 
Sbjct: 399 IERLFGTAKEYHNLRYTREKGKSK 422 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2908 

A DNA sequence (GASxl990R) was identified in S.pyogenes <SEQ ID 83 15> which encodes the amino 
acid sequence <SEQ ID 83 1 6>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have a cleavable N-term signal seq. 



Final Results 

bacterial outside Certainty=0. 3 000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2909 

A DNA sequence (GASxl991) was identified in S.pyogenes <SEQ ID 8317> which encodes the amino acid 
sequence <SEQ ID 8318>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.16 Transmembrane 2 - 18 ( 1 - 18) 



Final Results 

bacterial membrane — Certainty=0. 1065 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2910 

A DNA sequence (GASxl994) was identified in S.pyogenes <SEQ ID 8319> which encodes the amino acid 
sequence <SEQ ID 8320>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.44 Transmembrane 28 - 44 ( 28 - 44) 



Final Results 

bacterial membrane Certainty=0 . 1574 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 



WO 02/34771 



-2866- 



PCT/GB01/04789 



Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2911 

A DNA sequence (GASxl996) was identified in S.pyogenes <SEQ ID 832 1> which encodes the amino acid 
5 sequence <SEQ ID 8322>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 1076 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2912 

20 A DNA sequence (GASxl997R) was identified in S.pyogenes <SEQ ID 8323> which encodes the amino 
acid sequence <SEQ ID 8324>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N-terminal signal sequence 
25 INTEGRAL Likelihood = -7.96 Transmembrane 53 - 69 ( 49 - 75) 

INTEGRAL Likelihood = -2.34 Transmembrane 24 - 40 ( 24 - 43) 

Final Results 

bacterial membrane Certainty=0 .4185 (Affirmative) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
35 antigens for vaccines or diagnostics. 

Example 2913 

A DNA sequence (GASx2007R) was identified in S.pyogenes <SEQ ID 8325> which encodes the amino 
acid sequence <SEQ ID 8326>. Analysis of this protein sequence reveals the following: 

Possible site: 55 



40 



»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.64 Transmembrane 46 - 62 ( 43 - 65) 



Final Results 

45 bacterial membrane Certainty=0. 3654 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB97959 GB:U96166 ATP-binding cassette lipoprotein 
[Streptococcus cristatus] 
Identities = 37/50 (61%) , Positives = 42/60 (69%) , Gaps = 1/60 (1%) 

5 

Query: 59 FLTACGTKIODSKKEEVKEIKMSDIKDDAVSKKTKOTDGEEVTEYTTKDGNVIQIPAGNEE 118 

FL ACG+K KE + + K D K DAV +KTK VDG+EVTEYT DGNVIQIPA EE 
Sbjct: 12 FLAACGSKNADNKE - 1 SDGK3WDFKKDAVDQKTKTVDGKEOTEYTMPDGNVI QI PADGEE 70 

10 Based on this analysis, it was predicted that tfiis GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2914 

A DNA sequence (GASx2009) was identified in S.pyogenes <SEQ ID 8327> which encodes the amino acid 
sequence <SEQ ID 8328>. Analysis of this protein sequence reveals the following: 

15 Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 1246 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

25 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2915 

A DNA sequence (GASx2010) was identified in S.pyogenes <SEQ ID 8329> which encodes the amino acid 
30 sequence <SEQ ID 8330>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>» Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0. 2549 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2916 

45 A DNA sequence (GASx2012R) was identified in S.pyogenes <SEQ ID 8331> which encodes the amino 
acid sequence <SEQ ID 8332>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
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»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 33 07 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA27007 GB:L26141 pyrogenic exotoxin B [Streptococcus pyogenes] 
Identities = 40/102 (39%) , Positives = 57/102 (55%) , Gaps = 7/102 (6%) 

Query: 2 EMHFVRTEPEARRIAETFCAENTQTKTPMRVQQLSYPSDTDHSGGEL YIYALSPA 56 

+ +F R E EA+ A TF ++ K R + D + GGEL YIY +S 

Sbjct: 28 DQNFARNEKEAKDSAITFIQKSAAIKAGARSAE-DIKLDKVNLGGELSGSNMYIYNISTG 86 

Query: 57 GFIIVSGDTRAHTILGYSFDNNLDLN-HDNVRSMIEAYQKQI 97 

GF+IVSGD R+ ILGYS + D+N +N+ S +E+Y +QI 
Sbjct: 87 GFVIVSGDKRSPEILGYSTSGSFDVNGKENIASFMESYVEQI 128 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2917 

A DNA sequence (GASx2013R) was identified in S.pyogenes <SEQ ID 8333> which encodes the amino 
acid sequence <SEQ ID 8334>. Analysis of this protein sequence reveals the following: 
Possible site: 22 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2918 

A DNA sequence (GASx2014R) was identified in S.pyogenes <SEQ ID 8335> which encodes the amino 
acid sequence <SEQ ID 8336>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1392 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2919 

A DNA sequence (GASx2015) was identified in S.pyogenes <SEQ ID 8337> which encodes the amino acid 
5 sequence <SEQ ID 8338>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.75 Transmembrane 18 - 34 ( 17 - 37) 

10 

Final Results 

bacterial membrane Certainty=0 . 1702 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

20 Example 2920 

A DNA sequence (GASx2018) was identified in S.pyogenes <SEQ ID 8339> which encodes the amino acid 
sequence <SEQ ID 8340>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

25 >>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -5.84 Transmembrane 23 - 39 ( 22 - 40) 

Final Results 

bacterial membrane Certainty=0 . 3336 (Affirmative) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2921 

A DNA sequence (GASx2019) was identified in S.pyogenes <SEQ ID 834 1> which encodes the amino acid 
sequence <SEQ ID 8342>. Analysis of this protein sequence reveals the following: 

40 Possible site: 26 

>>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certa±nty=0 . 0669 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC98898 GB:AF023179 low temperature requirement C protein 
[Listeria monocytogenes] 
Identities = 95/144 (65%) , Positives = 117/144 (80%) 

Query: 15 IAERGVSLE&IAELVLFLQNDYI PNLTMAECLES VEAVIiAKREVQNAI ITGVELDRLAEA 74 

h ERGV ++ IAELVLFLQ Y P L + C ++VE VL KREVQNA++TG++LD +AE 
Sbjct: 16 L IERG VE I DDIAEL VLFLQQKYHPGLELDI CRQNVEHVLRKREVQNAVLTGI QLD VMAEK 75 

Query: 75 NQLSEPLLSILKTDQGLYGIDEILALSIVNLYGSIGFTNYGYLDKTKPGIVDKLNHKDGY 134 

+L +PL +1+ D+GLYG+DE I LALS I VN+YGS IGFTNYGY+DK KPGI+ KLN DG 
Sbjct: 76 GELVQPLQNIISADEGLYGVDEILALSIvNVYGSIGFTOTGYIDK^nCPGIIAKLlffiHDG^ 135 

Query: 135 SCHTFLDDIVSAIAAAAASRIAHN 158 

+ HTFLDDIV AIAAAAASR+AH+ 
Sbjct: 136 AVHTFLDDIVGAIAAAAASRLAHS 159 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2922 

A DNA sequence (GASx2030) was identified in S.pyogenes <SEQ ID 8343> which encodes the amino acid 
sequence <SEQ ID 8344>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0320 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2923 

A DNA sequence (GASx2031) was identified in S.pyogenes <SEQ ID 8345> which encodes the amino acid 
sequence <SEQ ID 8346>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0583 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 
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10 



15 



Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2924 

A DNA sequence (GASx2032R) was identified in S.pyogenes <SEQ ID 8347> which encodes the amino 
acid sequence <SEQ ID 8348>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.76 Transmembrane 27 - 43 ( 26 - 43) 



Final Results 

bacterial membrane Certainty=0 .2105 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

20 A related GBS gene <SEQ ID 8467> and protein <SEQ ID 8468> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -11.19 
GvH: Signal Score (-7.5): -4.94 
25 Possible site: 49 

»> Seems to have no N-terminal signal sequence 

ALOM program count: 1 value: -4.19 threshold: 0.0 

INTEGRAL Likelihood = -4.19 Transmembrane 25 - 41 ( 25 - 42) 
PERIPHERAL Likelihood = 13.26 41 
30 modified ALOM score: 1.34 

*** Reasoning Step: 3 

Final Results 

35 bacterial membrane Certainty=0 . 2678 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

40 ORF01616(304 - 429 of 771) 

SP|O06442|SECE_STAAU(7 - 48 of 60) PREPROTEIN TRANSLOCASE SECE SUBUNIT. 
GP| 2078376 |gb|AAB54017.l| |U96619 SecE {staphylococcus aureus} 
%Match =5.4 

% Identity =26.2 %Similarity =57.1 
45 Matches = 11 Mismatches = 18 Conservative Sub.s = 13 

99 129 159 189 219 249 279 309 

RIIQIMLK*HLWRRYGTKESKPSVYRMRKPKLLNRSK*HPQAOTT 

I 

50 MAKKESFF 

339 369 399 429 459 489 519 549 

KGIFQ VLRDTTWPNRKQRWKDFIS ILEYTVFFTIVIYIFDKLLAAGVMDL1NRF* * * I ILDRNNPNP* ILLRVFCVENNI 

||: = hll : = = =1 : == :|| : | :| 
55 KGVKSEMEKTSWPTKEELFKYTVIWSTVIFFLVFFYALDLGITALKNLLPG 
20 30 40 50 60 
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SEQ ID 8468 (GBS396) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 83 (lane 9; MW 35kDa). 

GBS396-GST was purified as shown in Figure 217, lane 8. 
5 Example 2925 

A DNA sequence (GASx2034R) was identified in S.pyogenes <SEQ ID 8349> which encodes the amino 
acid sequence <SEQ ID 8350>. Analysis of the protein sequence reveals the following: 

Possible site: 21 

10 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.59 Transmembrane 53 - 69 ( 53 - 70) 

Final Results 

bacterial membrane Certainty=0 . 1235 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2926 

A DNA sequence (GASx2035) was identified in S.pyogenes <SEQ ID 835 1> which encodes the amino acid 
sequence <SEQ ID 8352>. Analysis of this protein sequence reveals the following: 

25 Possible site: 39 

>» Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 . 2928 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

35 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2927 

A DNA sequence (GASx2042R) was identified in S.pyogenes <SEQ ID 8353> which encodes the amino 
40 acid sequence <SEQ ID 8354>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 2547 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

5 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2928 

A DNA sequence (GASx2043) was identified in S.pyogenes <SEQ ID 8355> which encodes the amino acid 
sequence <SEQ ID 8356>. Analysis of this protein sequence reveals the following: 

10 Possible site: 26 

>>> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 .3289 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

20 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that mis GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2929 

A DNA sequence (GASx2049) was identified in S.pyogenes <SEQ ID 8357> which encodes the amino acid 
25 sequence <SEQ ID 8358>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 .4014 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2930 

40 A DNA sequence (GASx2052) was identified in S.pyogenes <SEQ ID 8359> which encodes the amino acid 
sequence <SEQ ID 8360>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

»> Seems to have a cleavable N-term signal seq. 

45 

Final Results 
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bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

5 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-speciflc protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2931 

10 A DNA sequence (GASx2055R) was identified in S.pyogenes <SEQ ID 8361> which encodes the amino 
acid sequence <SEQ ID 8362>. Analysis of this protein sequence reveals the following: 

Possible site: 32 



15 



20 



>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3048 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S. agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05703 GB:AP001513 imidazolonepropionase 

(imidazolone- 5 -propionate hydrolase) [Bacillus halodurans] 
25 Identities = 203/416 (48%) , Positives = 278/416 (66%) , Gaps = 4/416 (0%) 

Query: 11 DVLLTHFNQLFCLNDPGHPLTGQEMKKATIVEDGYIAIKDGLIVALGSGEPDAELVGTQT 70 

D LL + QL + G P G+EM + ++E + I+DG + +G+ Q 
Sbjct: 6 DTLLWIGQLLPMESKG-PKRGKEMSELQLLEHAALGIRDGKVAFIGTMVEADTFTANQM 64 

30 

Query: 71 IMRSYKGKIATPGIIDCHTHLVYGGSREHEFAKKLAGVSYLDILAQGGGILSTVRATRSA 130 

I +GK+ TPG++D HTHL+ +GGSREHE A K GV YL+IL GGGIL+TV ATR+A 
Sbjct: 65 I - - DCQGKL VT PGLVD PHTHL I FGGSREHEMALKQQGVP YLE I LKNGGG I LATVEATRAA 122 

35 Query: 131 SFDNLYQKSKRLLDYMLLHGOTTVEAKSGYGLDWETEKRQLDWAALEKDHPIDLVSTFM 190 

S + L K+ L+ ML +GVTT+EAKSGYGLD ETE +QL A+ + HPID+VSTF+ 
Sbjct: 123 SEEELITKAICHLNRMLSYGVTTIEAKSGYGLDRETEWKQLRAAKAVGEQHPIDIVSTFL 182 

Query: 191 AAHAIPEEYKGNPKAYLDVIIKDMLPVVKEENLAEFCDIFCEKNVFTADESRYLLSKAKE 250 
40 AHAIP ++ +P +LD + DML +KE+NLAEF DIF E VFT ++SR L KAKE 

Sbjct: 183 GAHAIPTSHRNDPDRFLDEMA-DMLGEIKEQNLAEFVDIFTETGVFTVEQSRTFLQKAKE 241 

Query: 251 MGFKLRIHADEIASIGGVDVAAELSAVSAEHLMMITDDG1AKLIGAGVIGNLLPATTFSL 310 
GF L++HADEI +GG ++A EL A+SA+HL+ +D GI K+ AG I LLP TTF L 
45 Sbjct: 242 RGFGLKLHADEIDPLGGAELAGELGAISADHLVGASDQGIQKMAAAGTIACLLPGTTFYL 301 

Query: 311 MEDTYAPARKMIDAGMAITLSTDSNPGSCPTANMQFVMQLGCFMLRLTPIEVLNAVTINA 370 

+DTYA AR MID G+A+T+STD NPGS PT N+Q +M + L++TP E+ +AVT+N 
Sbjct: 302 GKDTYARARDMIDQGLATCISTDFNPGSSPTENLQLIMSIAALRLKMTPEEIWHAVTVNG 361 



50 



Query: 371 AYSVNRQERVGSLTVGKEADIAIFDAPNIDYPFYFFATNLIHQVYKKGQLTVDRGR 426 

A+++ R + G L VG+ AD+ ++DA N Y Y + N +H V+KKG++ +R R 
Sbjct: 362 AHAIGRGDTAGQLAVGRAADVVVWDAKNYYYVPYHYGVNHVHSWKXGEVVYERRR 417 



55 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2932 

A DNA sequence (GASx2056) was identified in S.pyogenes <SEQ ID 8363> which encodes the amino acid 
sequence <SEQ ID 8364>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

5 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1847 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:CAB61139 GB:AL132952 predicted using Genef inder-cDNA EST 

ykl55e6.3 comes from this gene-cDNA EST ykl55e6.5 comes 
from this gene-cDNA EST ykl56d6.5 comes from this 
gene-cDNA EST yk259bl0.3 comes fr 
Identities = 302/649 (46%) , Positives = 419/649 (64%) , Gaps = 17/649 (2%) 

20 

Query: 29 EGIRRAPDRGFRLTQAQTEIALKNALRYVPTKFHEEVIPEFLEELKTRGRIYGYRFRPKD 88 

+ + AP R LTQ + +A++NALRY+P + H + EF EEL T G IYGYRF P 
Sbjct: 85 KMVAHAPKRPOTLTQTEmiAVRNALRYIPKEHHVLIATEFAEElOTYGHIYGYRFMPNF 144 

25 Query: 89 RIYGKPIDEYKGNCTAAKAMQVMIDNl^SFEIALYPYELVTYGETGSVCANWMQYCLIKK 148 

++ P+ E +C A A+ +MI NNL +A +P ELVTYG G V +NW+Q+ L+ + 
Sbjct: 145 DLFAPPVSEIGAHOTQASAIILMILNNl^KRVAQFPQ 204 

Query: 149 YLEVMTDEQTLVVESGHPVGLFKSKPEAPRVIITNGLLVGEYDNMKDVffilAEEMGVTNYG 208 
30 YL MTD QTLV+ SGHP+GLF S P++PR+ +TNG+++ Y + ++ +GVT YG 

Sbjct: 205 YLYTMTDHQTLvLYSGHPLGLFPSTPDSPRMTVTNG^^MIPSYSTKELYDKYFALGvTQYG 264 

Query: 209 QMTAGGWMYIGPQGI vHGTFNTLLNAGRLKLGVADDGDLTGKLFISSGLGGMSGAQGKAA 268 
QMTAG + YIGPQGIVHGT T+LNAGR ++G+ L GK+ F+ + +GLGGMSGAQ KAA 

35 Sbjct: 265 QMTAGSFCYIGPQGIVHGTTITVLNAGR-RMGL DSLAGKVFVTAGLGGMSGAQPKAA 320 

Query: 269 EIAKAVAIIAEVDQSRIKTRHSQGWISQIAESPEEALQLAQKAIDAKESTSIAYHGNIVD 328 

+1A + +IAE+ + + RH QGW+ ++ EE + ++ + KE+ SI Y GN+VD 
Sbjct: 321 KI AGC IGVI AE I SDTALLKRHQQGWLDVYSKDLEE I VNWI KEYREKKEAI S I GYLGNWD 380 

40 

Query: 329 LLE-YVNDKQIHvDLLSDQTSCHNVYDGGYCPVGISFDERTRLLAEDKDTFHQMVDDTLA 387 

L E + + V+L SDQTS HN + GG+ P G++F++ +++ D F ++V ++L 
Sbjct: 381 LWERLAEEPECLVELGSDQTSLHNPFLGGFYPAGLTFEQSNQMMTSDPVKFKKLVQNSLI 440 

45 Query: 388 RHFEAI KTLTENGTYFFDYGNAFMKSVYDSGI TE I S KNGRNDKDGFI WPS YVEDIMGPML 447 

R AI + G YF+DYGNAF+ +G + ++ ++DK F +PSY++DIMG + 

Sbjct: 441 RQIAAIDKIAAKGMYFWDYGNAFLLECQRAGANLLREDAQDDK-SFRYPSYMQDIMGD-I 498 

Query: 448 FDYGYGPFRWVCLSGNHDDLVATDKAAMEAIDPDR RYQDRDNYNWIRDAEKN 499 

50 F G+GPFRWVC SG +DL TD+ A + ID + + Q DN WI +AEKN 

Sbjct: 499 FSMGFGPFRWVCTSGKPEDLRLTDQTACKIIDELKDTDVPEYVKQQYLDNKKWIEEAEKN 558 

Query: 500 QLWGTQARILYQDCIGRVTIALKFNELVRKGKI-GPVMIGRDHHDVSGTDSPFRETSNI 558 
+LWG+QARILY D GRV +A FNELV+ GK+ ++I RDHHDVSGTDSPFRETSN+ 
55 Sbjct: 559 KLWGSQARILYSDRAGRVALASAFNELVKSGKVSAAIVISRDHHDVSGTDSPFRETSNV 618 

Query: 559 KDGSNVTCDMAVQCYAGNAARGMSLVALHNGGGTGIGKAINGGFGLVLDGSERIDEIIKS 618 

DGS T DMAVQ G++ RG + VALHNGGG G G INGGFG+VLDGS + 
Sbjct: 619 YDGSAFTADMAVQNCIGDSFRGATWVALHNGGGVGWGDVINGGFGIvLDGSSDAARRAEG 678 



60 



Query: 619 AIAWDTMGGVARRNWARNEHAIETAIEYNRLHAGTDHITIPYLADDDLV 667 

+ WD GV RR+W+ N A E AI+ +T+P AD++L+ 

Sbjct: 679 MLNWDVPNGVTRRSWSGNAKAQE-AIQRAEKQVDGLRVTLPVEADEEIjL 726 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2933 

A DNA sequence (GASx2057) was identified in S.pyogenes <SEQ ID 8365> which encodes the amino acid 
sequence <SEQ ID 8366>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1887 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD35925 GB:AE001751 

f ormiminotrans f erase - 

cyclodeaminase/f ormiminotetrahydrof olate cyclodeaminase , 
putative [Thermotoga maritima] 
Identities = 160/296 (54%) , Positives = 214/296 (72%) , Gaps = 2/296 (0%) 

Query: 3 KI VEC I PNFSEGQNQAVIDGLVATAKS I PGVTLLDYSSDASHNRS VFTLVGDDQSIQEAA 62 

K++E +PNFSEG+ + V++ +VA AK V +LD+S DA HNRSV TLVG+ +++ A 
Sbjct: 2 KLIESVPNFSEGRRKEVVEKIVAEAKKyDRVWVLDWSMDADHNRSVITLVGEPENLlNAL 61 

Query: 63 FQLVKYASENIDMTKHHGEHPRMGATDVCPFVPIKDITTCjECVEISKQVAERINRELGIP 122 

F + K A+E ID+ H G+HPRMGA DV P VP+ + T +ECVE SK + RI ELGIP 
Sbjct: 62 FDMTKKAAELIDLRNHTGQHPRMGAADVIPLVPLYNTTMEECVEYSKILGRRIGEELGIP 121 

Query: 123 IFLYEDSATRPERQNIAKVRKGQFEGMPEKLIiEEDWAPDyGDRKIHPTAGVTAVGARMPL 182 

++LYE SATRPERQNLA +RKG+FEG EK+ + W PD+G ++HPTAGVTAVGAR L 
Sbjct: 122 VYLYEKSATRPERQNIiADIRKGEFEGFFEKIKDPLWKPDFGPDRVHPTAGVTAVGAREFL 181 

Query: 183 VAFNvNLDTDNIDIAHKIAKIIRGSGGGYKYCKAIGVMLEDRHIAQVSMNMVNFEKCSLY 242 

+AFNVNL T ++ IA KIA+ IR S GG +Y KAIGV L+ R + QVS+N+ N +K LY 
Sbjct: 182 lAFNVNLGTRDVKIAEKIARAIRFSSGGLRYVKAIGVDLKGRGWQVSINI'TNHKKTPIiY 241 

Query: 243 RTFETIKFEARRYGvNVIGSEVIGLAPAKALIDVAEYYLQVEDFDYHKQILENHLL 298 

R FE IK EA RYGV V+GSE++GL P ++L+ YYL+ + K+++E++LL 
Sbjct: 242 RVFELIKMEAERYGVPVLGSEIVGLFPLESLLKTVSYYLRTD--LNAKKVIESNLL 295 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2934 

A DNA sequence (GASx2058) was identified in S.pyogenes <SEQ ID 8367> which encodes the amino acid 
sequence <SEQ ID 8368>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2776 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside certainty=0 . 0000 (Not Clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA62653 GB-.L33465 methenyl tetrahydrofolate cyclohydrolase 
[Methylobacterium extorquens] 
Identities = 79/198 (39%) , Positives = 112/198 (55%) 

Query: 7 SLTDFAKVLGSDAPAPGGGSAAALSGANGISLTKMVCELTLGKKKYADYQD1ITEIHAKS 66 

++ F L S AP PGGG AAAt-SGA G +L MVC LT+GKKKY + + + ++ KS 
Sbjct: 6 TIETFLDGIASSAPTPGGGGAAAISGAMGAALVSMVCS^TIGKKKYVEVEADLMQVLEKS 65 

Query: 67 TALQASLIAAIDKDTEAFNLVSAVFDMPKETDEDKAARRTAMQKALKTAAQSPFEMMTLM 126 

L+ +L ID EAF+ V + +PK TDE+KAAR +Q+ALKTA P + 
Sbjct: 66 EGLRRTLTGMIADDVEAFDAvMGAYGLPKNTDEEKAARAAKIQEALKTATDVPLACCRVC 125 

Query: 127 VEALEITATAVGKSNTNAASDLGVAALNLKM 186 

E +++ K N N SD GVA L+ AGL+ A LNV +N G+ D F + ++ 

Sbjct: 126 REVIDIAEIVAEKGNLNVISDAGVAVLSAYAGLRSAALNVYVNAKGLDDRAFAEERLKEL 185 

Query: 187 QALLDKGCHLADDIYTKI 204 

+ LL + L + IY + 
Sbjct: 186 EGLIAEAGALNERIYETV 203 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2935 

A DNA sequence (GASx2061) was identified in S.pyogenes <SEQ ID 8369> which encodes the amino acid 
sequence <SEQ ID 8370>. Analysis of this protein sequence reveals the following: 
Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3924 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2936 

A DNA sequence (GASx2063) was identified in S.pyogenes <SEQ ID 8371> which encodes the amino acid 
sequence <SEQ ID 8372>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.06 Transmembrane 231 - 247 ( 231 - 247) 
INTEGRAL Likelihood = -0.53 Transmembrane 2 - 18 ( 1 - 18) 

Final Results 

bacterial membrane Certainty=0 . 1426 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



WO 02/34771 



PCT/GB01/04789 



-2878- 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

5 >GP:CAB15971 GB:Z99124 histidase [Bacillus subtilis] 

Identities = 236/477 (49%) , Positives = 321/477 (66%) , Gaps = 2/477 (0%) 

Query: 42 VINLDGESLTIEDVIAIARQGVACHIDDSAIEAVNASRKIVDDIVSEKRWYGVTTGFGS 101 
++ LDG SLT DV + + ++E V SR V+ IV +++ +YG+ TGFG 

10 Sbjct: 1 ^WTLDGSSLTTADVARv^JFDFEFJU^SEESMERVKKSRAAvERIWDEKTIYGINTGFGK 60 

Query: 102 LCNVSISPEDTVQLQENLIRTHASGFGDPLPEDAVRAIMLIRINSLVKGYSGIRLSTIEK 161 

+V I ED+ LQ NLI +HA G GDP PE RA++L+R N+L+KG+SG+R IE+ 
Sbjct: 61 FSDVLIQKEDSAALQLNLILSHACGVGDPFPECVSRAMLLLRANALLKGFSGVRAELIEQ 120 

15 

Query: 162 LLELI J NKGVHPYIPEKGSLGASGDIAPLAH^5VLPMLGLGKAVYKGELLSGQEALDKAGID 221 

LL LNK VHP IP++GSLGASGDLAPL+H+ L ++G G+ +++GE + L KAGI 

Sbjct: 121 LLAFLNKRVHPVIPQQGSLGASGDLAPLSHLALALIGQGEVFFEGERMPAMTGLKKAGIQ 180 

20 Query: 222 KISLAAKEGLALINGTTVLTAVGAIATYDAIQLLKLSDLAGALSI^VHNGITSPFEFJJLH 281 

++L +KEGLALINGT +TA+G +A +A +L ++ +L++E GI F+E++H 
Sbjct: 181 PVTLTSKEGLALINGTQAMTAMGWAYIEAEKLAYQTERIASLTIEGLQGIIDAFDEDIH 240 

Query: 282 TIRPQSGQLATARNIRNLLEGSQNTTVATQSRVQDPYTLRCMPQIHGASKDSIAYVKSKV 341 
25 R Q+ A IR L S TT + RVQD Y+LRC+PQ+HGA+ ++ YVK K+ 

Sbjct: 241 LARGYQEQIDVAERIRFYLSDSGLTTSQGEIiRVQDAYSLRCIPQVHGATWQTLGYVKEKL 300 

Query: 342 DIEINSVTDNPIICKDG-HVISGGNFHGEPMAQPFDFLGIAISEIGNVSERRVERLVNSQ 400 
+IE+N+ TDNP+I DG VISGGNFHG+P+A DFL IAISE+ N++ERR+ERLVN Q 
30 Sbjct: 301 E IEMNAATDNPL I FNDGDKVI SGGNFHGQPIAFAMDFLKIAISELANIAERRI ERLVNPQ 360 

Query: 401 LSKLPSFLVKYPGLNSGFMITQYAC^SLASENKvIiAHPASVDSIPSCENQEDFVSMGTTA 460 

L+ LP FL +PGL SG MI QYA ASL SENK LAHPASVDSIPS NQED VSMGT A 
Sbjct: 361 LNDLPPFLSPHPGLQSGAMIMQYAAASLVSENKTLiAHPASVDS I PSSANQEDHVSMGTIA 420 

35 

Query: 461 ARKAFEILKNSRRIVATEIMAACQALDLKPENHELGKGTKVAYDLFRKEVNFIEHDK 517 

AR A++++ N+RR++A E + A QA++ + H TK + RK V 1+ D+ 

Sbjct: 421 ARHAYQVIANTRRVIAIEAICALQAVEYRGIEH-AASYTKQLFQEMRKWPSIQQDR 476 

40 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2937 

A DNA sequence (GASx2064) was identified in S.pyogenes <SEQ ID 8373> which encodes the amino acid 
sequence <SEQ ID 8374>. Analysis of this protein sequence reveals the following: 

45 Possible site: 44 

>>> Seems to have no N-termlnal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0 .4483 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
55 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG06563 GB:AE004741 probable arginase family protein 
[Pseudomonas aeruginosa] 
Identities = 99/275 (36%) , Positives = 147/275 (53%) , Gaps = 9/275 (3%) 
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Query: 53 LIGFKSDKGVYINNGRVQaVESPAAIRTQLAKFPWHLGNQVMVYDVGNIDGPNRSLEQLQ 112 

L+GF SD+GV N GR GA P A+R LA WH G Q +YD G+I + LE Q 
Sbjct: 42 LLGFASDEGVRRNQGRO^ARHGPPALRRALAHLAWH-GEQA-IYDAGDIVAGD-DLEAAQ 98 

5 

Query: 113 NSLSKAIKRMCDLNLKPIVLGGGHETAYGHYLGLRQSLSPSDDL AVINMDAHFDLRP 169 

++ + + + + LGGGHE AY + GL + LS + L ++N DAHFDLR 

Sbjct: 99 ECYAQRVADLLACGHRWGLGGGHEIAYASFAGLARHLSRHERLPRIGILNFDAHFDLRH 158 

10 Query: 170 YDQTGPNSGTGFRQMFDDAVADKRLFKYFVLGIQEHNNNLFLFDFVAKSKGIQFLTGQDI 229 

++ +SGT FRQ+ + A FY LGI +N LFD A+ G+++L + + 
Sbjct: 159 AERA--SSGTPFRQIAELCQASDWPFAYCCLGISRLSNTAALFD-QAQRLGVRYLLDRQL 215 

Query: 230 YQMGHQKVCRAIDRFLEGQERVYLTIDMDCFSVGAAPGVSAIQSLGVDPNLAVLVLQHIA 289 
15 ++ +D FL+ + +YLT+ +D APGVSA + GV+ + +++ 

Sbjct: 216 QPWNLERSEAFLDGFLQSVDHLYLTVCLDVLPAAQAPGVSAPSAHGVEMPWEHLVRRAK 275 

Query: 290 ASGKLVGFDWEVSPPHDIDNHTANLAATFIFYLV 324 
ASGKL D+ E++P D D TA +AA + LV 
20 Sbjct: 276 ASGKLRIADIAELNPQLDSDQRTARIAARLVDSLV 310 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2938 

25 A DNA sequence (GASx2065R) was identified in S.pyogenes <SEQ ID 8375> which encodes the amino 
acid sequence <SEQ ID 8376>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>» Seems to have no N-terminal signal sequence 
30 INTEGRAL Likelihood = -0.37 Transmembrane 375 - 391 ( 375 - 392) 

Final Results 

bacterial membrane Certainty=0 . 1150 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB37582 GB:AL035569 putative regulatory protein [Streptomyces 
40 coelicolor A3 (2)] 

Identities = 95/437 (21%) , Positives = 177/437 (39%) , Gaps = 28/437 (6%) 

Query: 271 EVGALLLIGDTGIGKRTLARQVLANQTQTFQIVTAKCFREEAMDSL--LPWRNILDGLGD 328 
E ALLL G+ G+GK L + A + +V EDLP+LL 
45 Sbjct: 95 EPQALLLGGEAGVGKTRLVEEFAAAADRRGAWALGGCVEIGADGLPFAPFSTALRALRR 154 

Query: 329 LVIQNRLLTTKAWKAALKRCFP-VATIFQEDNNQPFIKDHTSLLVSFIVDILQHLAEIKA 387 

+ + +LRP+A +++L +L+ +A 

Sbjct: 155 HLPEEIAAAAAGQEEELARLLPELAEGTPVTGGGRHDEESMARLFELTARLLERVAARHT 214 

50 

Query: 388 LVILIEDCHWMDEDSLTLLQRVMNQLVHYPIAFVLT KHLGTTPELGLCLNALM 440 

+V+++ED HWD + L+++L + +T + PLL+L 

Sbjct: 215 VVL VLEDLHWADASTRHLIAYLLRTLRTGRLVVLATYRSDDIHRRHPLRPLLAE - LDRLR 273 

55 Query: 441 SQGRLESICLEPFNRQESLVYINSQLGSQPVTAEEMEHLYQASQGNPFFLSEYTQALLRH 500 
+ RLE L F R E I L +P +++ +++ S GN FF+ EAR 
Sbjct: 274 TVRRLE LGRFTRDEVGRQIAGILAHEP-DQLQVDEIFERSDGNAFFVEELAVA-ARV 328 

Query: 501 EKFVPLTPAIKAKLGLKIANLSSRDDALLNYLSC<^PIPI J NTLAQLMLLPLEEVIE^IVD 560 
60 LT +++ L +++ L + ++ + LA + L +++IE + 

Sbjct: 329 GSCTGLTDSLRDLLLTOVEALPESAQRVARIVAEGGSTVEYRLLAAVARLAEDDLIEALR 388 
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Query: 561 NLGHYYILVEESVGEEVLISFRQRIIQLYSYDRLSLSKRRLLHGQIAKRLEDLLPILTPS 620 

+ + IL+ G+ FR +++ D L +R L+ + A+ L D P L P+ 

Sbjct: 389 SAVNANI LLPAPDGDG - - YRFRHSLWFJWGDDLLPGERSRLNRRYAEAL - DADPTLVPA 445 

Query: 621 PHLLDDIAYHYQESRQVIKALEYNLNYLDATLPFQHELFPIYSKSIGSLEKSDRDHQRLM 680 

+ +A ++ + KAL LDA++ + YS+ + LE++ L 
Sbjct: 446 AERVMRLASYWYHAHAPAKALP AVLDASVEARRR- -HAYSEQLRLLERA MELW 496 

Query: 681 EEQFDKIRQSIADLELT 697 

+ D +R ++ ++ T 
Sbjct: 497 DSAPDDVRATLRPVDCT 513 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2939 

A DNA sequence (GASx2072) was identified in S.pyogenes <SEQ ID 8377> which encodes the amino acid 
sequence <SEQ ID 8378>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3702 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2940 

A DNA sequence (GASx2074R) was identified in S.pyogenes <SEQ ID 8379> which encodes the amino 
acid sequence <SEQ ID 8380>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.90 Transmembrane 21 - 37 ( 21 - 38) 

Final Results 

bacterial membrane Certainty=0 . 1362 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2941 

A DNA sequence (GASx2075R) was identified in S.pyogenes <SEQ ID 8381> which encodes the amino 
acid sequence <SEQ ID 8382>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

5 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 54 5 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2942 

A DNA sequence (GASx2076R) was identified in S.pyogenes <SEQ ID 8383> which encodes the amino 
acid sequence <SEQ ID 8384>. Analysis of this protein sequence reveals the following: 

20 Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 2340 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
30 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC44494 GB:U44893 orfl08; unknown function [Butyrivibrio 
f ibrisolvens] 
Identities = 42/75 (56%) , Positives = 55/75 (73%) 

35 Query: 1 LLKGTLRFGQLKSSIGSVSQKVLTAQLRAMEADGLVHREVYAEVPPRVEYSLTETGLSLA 60 

LL RF +LK+++ +SQKVLT LR+ME DG++ R VY EVPPRVEYSL+E G S+ 
Sbjct: 31 LLWPWRFNELKNNLEGISQKVLTDSLRSMEEDGIITRTVYPEVPPRVEYSLSELGESMR 90 

Query: 61 PVIEAMSDWGQTYQE 75 
40 P+I+AM WG Y+E 

Sbjct: 91 P I I KAMEQWGTE YKE 105 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

45 Example 2943 

A DNA sequence (GASx2097) was identified in S.pyogenes <SEQ ID 8385> which encodes the amino acid 
sequence <SEQ ID 8386>. Analysis of this protein sequence reveals the following: 

Possible site: 40 



50 



>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.40 Transmembrane 26 - 42 ( 23 - 44) 
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Final Results 

bacterial membrane Certainty=0. 23 59 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
10 antigens for vaccines or diagnostics. 

Example 2944 

A DNA sequence (GASx2098) was identified in S.pyogenes <SEQ ID 8387> which encodes the amino acid 
sequence <SEQ ID 8388>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

15 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1385 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

25 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2945 

A DNA sequence (GASx2100) was identified in S.pyogenes <SEQ ID 8389> which encodes the amino acid 
sequence <SEQ ID 8390>. Analysis of this protein sequence reveals the following: 

30 Possible site: 23 

>>> Seems to have no N- terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0. 2 13 8 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
40 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA98589 GB:L44593 ORF79; putative [Lactococcus lactis phage 
BK5-T] 

Identities = 34/62 (54%) , Positives = 44/62 (70%) 

45 Query: 3 QITLKAARINAGYTLKQVAGAVGKNPQTISKYEKDSSDISLGLLQKLSSLYGVTIDNLFL 62 

+1 LKAAR NA ++ K+VA VGKN QTI YEKDS++I + L KL+ +Y ID +FL 
Sbjct: 8 KI KLKAARTNADFSAKEVAE I VGKNYQTILSYEKDSTEI PMSLAI KLAE I YDYPIDFI FL 67 

Query: 63 GK 64 
50 GK 

Sbjct: 68 GK 69 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2946 

5 A DNA sequence (GASx2103) was identified in S.pyogenes <SEQ ID 8391> which encodes the amino acid 
sequence <SEQ ID 8392>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty=0 . 3316 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

20 Example 2947 

A DNA sequence (GASx2104) was identified in S.pyogenes <SEQ ID 8393> which encodes the amino acid 
sequence <SEQ ID 8394>. Analysis of this protein sequence reveals the following: 
Possible site: 55 
25 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4371 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
35 antigens for vaccines or diagnostics. 

Example 2948 

A DNA sequence (GASx2105) was identified in S.pyogenes <SEQ ID 8395> which encodes the amino acid 
sequence <SEQ ID 8396>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

40 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2263 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
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The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2949 

5 A DNA sequence (GASx2106) was identified in S.pyogenes <SEQ ID 8397> which encodes the amino acid 
sequence <SEQ ID 8398>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have an uncleavable N-term signal seq 
10 INTEGRAL Likelihood = -6.42 Transmembrane 9 - 25 ( 6-29) 

Final Results 

bacterial membrane Certainty=0. 3569 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
20 antigens for vaccines or diagnostics. 

Example 2950 

A DNA sequence (GASx2107) was identified in S.pyogenes <SEQ ID 8399> which encodes the amino acid 
sequence <SEQ ID 8400>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

25 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1355 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2951 

A DNA sequence (GASx2108) was identified in S.pyogenes <SEQ ID 840 1> which encodes the amino acid 
sequence <SEQ ID 8402>. Analysis of this protein sequence reveals the following: 

40 Possible site: 26 

>>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 3050 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not clear) < suco 
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10 



No corresponding DNA sequence was identified mS.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2952 

A DNA sequence (GASx2109) was identified in S.pyogenes <SEQ ID 8403> which encodes the amino acid 
sequence <SEQ ID 8404>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3628 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

20 >GP:CAB46557 GB:AJ242479 putative replication protein [Streptococcus thermophilus] 

Identities = 143/242 (59%), Positives = 180/242 (74%), Gaps = 2/242 (0%) 

Query: 1 MAI YEARGFSSYLY- - PYKGPLEPPDYIAQFRPLKPPEDIDIEEYKRTOAPYCLSGKVTA 58 
MAIYE+RGF + L+ +PF ++A FRP+K P+ DI ++KR APYC+SG+V 

25 Sbjct: 1 MAIYESRGFGNILHLNNSNASKDPFKFVATFRPMKVPQGEDIADFKRYHAPYCISGEVKQ 60 

Query: 59 EKNGSYKRNNASLVYRDLIFLDYDEIETGVNLPKIVSQTLWEYSYI IYPTIKHTPEKPRY 118 

+++G+YKRNNASL+YRDLIFLDYD++E + P+ VS L YSY+IYPTIKHT EKPRY 
Sbjct: 61 DEDGNYKRNNASLLYRDLIFLDYDKLEASTDFPRAVSNALNGYSYVIYPTIKHTAEKPRY 120 

30 

Query: 119 RLVMKPSDVMTEATYKQWKEIADKIGLPFDLASLTWSQLQGLPVTTGDPEDYQRYVNHG 178 

RLV+KP+D M E TYK +EIADKIGLPFD +SLTWSQLQGLPVTTGDPE Y+R VN G 
Sbjct: 121 RLVVKPTDKI^EQTYKATAQEIADKIGLPFDDSSLTWSQLQGLPVTTGDPEKYERIVNRG 180 

35 Query: 179 LDYPVPKNGSTPNRQVVTTYTPRPRSQRSITMRVIDTLFNGFGNEGGRNVALTKFVGLLF 238 

YPV + +TPR +S+TMRV+DTL NGFG+EGGRN+ +T+FVGLL 

Sbjct: 181 RCYPVANPNTVKANHSPNYHTPRQSGDKSLTMRVVDTLIiNGFGDEGGRNIEVTRFVGLLL 240 

Query: 239 NK 240 

40 +k 

Sbjct: 241 SK 242 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

45 Example 2953 

A DNA sequence (GASx21 10) was identified in S.pyogenes <SEQ ID 8405> which encodes the amino acid 
sequence <SEQ ID 8406>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
50 >» Seems to have no N-terminal signal sequence 



Final Results --• 

bacterial cytoplasm Certainty=0. 5215 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
5 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB46558 GB:AJ242479 putative DNA primase [Streptococcus thermophilus] 
Identities = 274/548 (50%) , Positives = 363/548 (66%) , Gaps = 17/548 (3%) 

Query: 17 DLKNLENEITEARE NEDKYFSTFKGVRGQLI KECQEMKDEAFKIAYDGVMADSK 70 

10 DL LE E E+++ +ED Y TFK +R Q I ++ K+ A++ YD M + K 

Sbjct: 8 DLTKLEEEYNESKKEASTLFDEDGYLKTFKDIRKQFINILEQKKEIAYQKGYDLYMNNPK 67 

Query: 71 HLENVKAGRLTEVQHE ELAKEKGQEASEKALPKTPLGVAIMLKHYLRFIRVKP 123 

L + EE E AK++G++A + A PKTPL A LK Y+RFIR++P 

15 Sbjct: 68 VLLKLAKAEKDEENGELIRKTVIEDAKKEGEKAKKNATPKTPLECAEFLKKYIRFIRIRP 127 

Query: 124 EAQGQKAPLYFFHPDHGVWLEDNEFLQDLISVIFPNATEKQAFDTLYKIARQSQLKEIQR 183 

+ +G++ F G++LED+EFL DL+ I PN TE+ D LYKIA LK+ Q 

Sbjct: 128 KGKGRERLYTFTRQILGIYLEDDEFLHDLMVTIHPNNTERLGNDALYKIAHSVPLKDKQE 187 

20 

Query: 184 EYTVIGNQLYNYKTGQFEELTPDITVTRKIKTGYNKKAKEPTIKGWKPTAWLLELFDGDA 243 

Y V+G +LYN +TG+F + PI VTRK++ GYN A EP I GWKPT WL LF+GD 
Sbjct: 188 NYWVGGELYNNETGEFTQFDPRI I VTRKVRMGYNPDATEPI IDGWKPTVWLKGLFNGDR 247 

25 Query: 244 ELYNLAIQIIKASITGQSLQKIFWLFGEGGTGKGTFQQLLINLVGMDNVASLKITELAKS 303 

+ Y+LAIQII+A+ITG++L+ IFWL+GEGGTGKGTFQ hh NLVG +NVAS KI + A 
Sbjct: 248 DSYDLAIQI IRATITGKTLENIFWLYGEGGTGKGTFQTLLENLVGSENVASFKI -DGASG 306 

Query: 304 RFTTSILLGKSIVIGDDIQKTJAVIKDTSDIFSIATGDIMTIEIDKGKRPYSIRLNMTVVQS 363 
30 +F TSIL+GK++VIGDDIQKD VIKDTS +FSLATGD + IEDRGKRPY+ R MTWQS 

Sbjct: 307 KFDTSILIGKTWIGDDIQKDWIKDTSWFSLATGDPIRIEDKGKRPYTTRKRMTVVQS 366 

Query: 364 SNGLPRIMGDKSAIDRRFRILPFTKVFKGKPNKAIRNDYINRKEVLEYLLKLAIETPITD 423 
SNG PRMN D+ AI+RRFR+L F+++ KGK +K I+NDY+ RKEVLEY +KLAIETP D 
35 Sbjct: 367 SNGFPR^ADQKAINPJ^FRVliTFSEL-KGKADKRIKNDYVGRKEVLEYFVKLAIETPFRD 425 

Query: 424 INPKASIEILEEHHKEMNPVIDFVSKFFTDE-LTSEFIPNSFVYHVWKGFLEYYDIKQ-I 481 

+NP+ SIE L+E +KEMNPV DFV +FF DE + ++PN +V+ +K + E + 
Sbjct: 426 TOPQKSIEFLDEAYKE^PVADFVDRFFNDEVIKCNYVPNGYVFECFKAYCEKNQNRNYF 485 

40 

Query: 482 KSERGLHKE I KSNLPEGFEAGQKVI PVGRQLHTGFYPKEDLPLFASASYANGRAS PEKRK 541 

+ R LHK+IK LP+ F + I G++ + F P + +Y NGR E ++ 

Sbjct: 486 IiNSRTLHKQI KKI LPKTFRPKEVT I KKGQKFYEEFNPHLVSNPWHFDAYDNGRNKKEDQQ 545 

45 Query: 542 KPKNERGY 549 

K ERGY 
Sbjct: 546 DAKKERGY 553 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
50 antigens for vaccines or diagnostics. 

Example 2954 

A DNA sequence (GASx21 11) was identified in S.pyogenes <SEQ ID 8407> which encodes the amino acid 
sequence <SEQ ID 8408>. Analysis of this protein sequence reveals the following: 

Possible site: 41 



55 



»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0994 (Affirmative) < suco 

60 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
5 antigens for vaccines or diagnostics. 

Example 2955 

A DNA sequence (GASx21 12) was identified in S. pyogenes <SEQ ID 8409> which encodes the amino acid 
sequence <SEQ ID 841 0>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

10 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3058 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2956 

A DNA sequence (GASx21 14) was identified in S. pyogenes <SEQ ID 841 1> which encodes the amino acid 
sequence <SEQ ID 8412>. Analysis of this protein sequence reveals the following: 

25 Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 .2815 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

35 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2957 

A DNA sequence (GASx2115R) was identified in S.pyogenes <SEQ ID 8413> which encodes the amino 
40 acid sequence <SEQ ID 841 4>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have an uncleavable N-term signal seq 

45 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

5 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2958 

A DNA sequence (GASx21 16) was identified in S.pyogenes <SEQ ID 841 5> which encodes the amino acid 
10 sequence <SEQ ID 841 6>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have no N- terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0. 42 13 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 No corresponding DNA sequence was identified in S. agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2959 

25 A DNA sequence (GASx21 17) was identified in S.pyogenes <SEQ ID 841 7> which encodes the amino acid 
sequence <SEQ ID 841 8>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have no N- terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0. 3091 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

40 Example 2960 

A DNA sequence (GASx2118) was identified in S.pyogenes <SEQ ID 841 9> which encodes the amino acid 
sequence <SEQ ID 8420>. Analysis of this protein sequence reveals die following: 

Possible site: 41 
45 »> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

10 Example 2961 

A DNA sequence (GASx2119) was identified in S.pyogenes <SEQ ID 842 1> which encodes the amino acid 
sequence <SEQ ID 8422>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

15 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2531 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF63071 GB:AF158600 gpl37 [Streptococcus thermophilus 
25 bacteriophage Sfill] 

Identities = 41/121 (33%) , Positives = 65/121 (52%) , Gaps = 3/121 (2%) 

Query: 4 KNAIRKLKEFHRWQRIAN-SLDLTYTELYQFDIEYHPTRR--KHLEISRECALEELDAIR 60 
K RKL+E+ RW+ IA+ S + T+ + F + +++ + R AL EL+AI 

30 Sbjct: 13 KRCKRKLREYPRmEIAHDSAEQKITQEFTFMPRGGGVNKPVENIAVRRVDALNELEAIE 72 

Query: 61 YAINQLSKVEYRQILIECYLISEEKTQQDIMEELNGSQSWYYESKKRALLEFVEFYRDGAL 121 

A+N L + +YR+ILIE YL K I + + ++ + E ++L F E YRDG L 
Sbjct: 73 QAVNGLYRPDYRRILIEKYLAYPPKPNWQIAQSIGFERTAFQELLNNSILAFAELYRDGRL 133 

35 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2962 

A DNA sequence (GASx2120) was identified in S.pyogenes <SEQ ID 8423> which encodes the amino acid 
40 sequence <SEQ ID 8424>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>>> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty^O .2666 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2963 

A DNA sequence (GASx2121) was identified in S.pyogenes <SEQ ID 8425> which encodes the amino acid 
5 sequence <SEQ ID 8426>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have a cleavable N-term signal seq. 

10 Final Results 

bacterial outside Certainty=0.3000 (Af f irmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2964 

20 A DNA sequence (GASx2123R) was identified in S.pyogenes <SEQ ID 8427> which encodes the amino 
acid sequence <SEQ ID 8428>. Analysis of this protein sequence reveals the following: 
Possible site: 21 



25 



30 



>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3441 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

35 Example 2965 

A DNA sequence (GASx2132) was identified in S.pyogenes <SEQ ID 8429> which encodes the amino acid 
sequence <SEQ ID 8430>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

40 »> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not clear) < suco 

45 bacterial cytoplasm Certainty=0. 0000 (Not clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 
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The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2966 

5 A DNA sequence (GASx2136) was identified in S.pyogenes <SEQ ID 843 1> which encodes the amino acid 
sequence <SEQ ID 8432>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N- terminal signal sequence 
10 INTEGRAL Likelihood = -3.19 Transmembrane 57 - 73 ( 54 - 78) 

Final Results 

bacterial membrane Certainty=0. 2275 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB18271 GB:U74623 CadX [Staphylococcus lugdunensis] 
20 Identities = 50/110 (45%), Positives = 76/110 (58%) 

Query: 11 MKKDS ICQVGVINQQNVTTATNYLEKEKVQKSLRILSKFTDNKQINI I FYLLAVEELCVC 70 

M ++ C V +++ V A ++LE +K +K L IL K D K++ II L+ +ELCVC 
Sbjct: 1 MSYENACDVICVHEDKVNNALSFLEDDKSKKLLNILEKICDEKKLKIILSLIKEDELCVC 60 

25 

Query: 71 DIACLLNLSMASASHHLRKLANQNILDTRREGKIIYYFIKDEEIRDFFNQ 120 

DI+ +L +S+AS SHHLR L ++LD ++GK+ YYFIKD+EIR+FF++ 
Sbjct: 61 DISLILKMSVASTSHHLRLLYKNDVLDFYKKGKMAYYFIKDDEIREFFSK 110 

30 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2967 

A DNA sequence (GASx2137) was identified in S.pyogenes <SEQ ID 8433> which encodes the amino acid 
sequence <SEQ ID 8434>. Analysis of this protein sequence reveals the following: 

35 Possible site: 49 

»> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0 .4582 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

45 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2968 

A DNA sequence (GASx2139) was identified in S.pyogenes <SEQ ID 8435> which encodes the amino acid 
sequence <SEQ ID 8436>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

5 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.89 Transmembrane 63 - 79 ( 54 - 80) 

Final Results 

10 bacterial membrane Certainty=0 . 3357 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

15 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2969 

A DNA sequence (GASx2141R) was identified in S.pyogenes <SEQ ID 8437> which encodes the amino 
20 acid sequence <SEQ ID 8438>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>» Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 .4663 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2970 

35 A DNA sequence (GASx2142) was identified in S.pyogenes <SEQ ID 843 9> which encodes the amino acid 
sequence <SEQ ID 8440>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have a cleavable N-term signal seq. 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD35257 GB:AE001701 conserved hypothetical protein [Thermotoga maritima] 
Identities = 81/275 (29%) , Positives = 137/275 (49%) , Gaps = 29/275 (10%) 



Query: 


9 


FKGMIIALGFILPGVSGGVLAAILGIYERMISFLAHMRDNFIENVLFFLPVGIG- - -GIL 


65 






F G+++ + ++PGVSGG +A ++G+YE++I + ++ +PVG G G+ 




Sb j Ct : 


7 


FSGVLMGIANWPGVSGGTIAVLMGVYEKLIESVNSFFHGNSRSLKVLIPVGAGVLVGVF 


66 


Query: 


66 


GIALFSFPVEFLLKHYQVSVLWGFAGAIVGTIPSLIKESTKQSQRDKADWLWLVLTFVIS 


125 






GIA F +E L Y V + F G I I S +K TK+ K + + FV+ 




Sb j ct : 


67 


GIARF- - - LE I FLS KYPVPTHFFFLGL I IVSFVK- -TKEYFSIKP VNIFFVLL 


114 


Query: 


126 


GLGLYFLNDLIG--TLPANFLTFILAGALIALGVLVPGLSPSNLLLILGLYGPMLIGFKS 


183 






G+ L F+ G T + +L G + A ++VPG+S S +LLI G+Y +L 




Sb j ct : 


115 


GMFLIFMLHFSGETTAKESMFLLVLGGFVAATAMWPGISGSLILLIFGVYDHVLYLVSH 


174 


Query: 


184 


LDLLGTFLPIAIGGVLAILAFSKSMDYALQHHHSKVYHFIIGI VLSSTLLILIPNSSSPE 


243 






L ++G L +IG V IL K M++ L+ + Y FI G++L+S L ++P + 




Sb j ct : 


175 


L-IIGELLIFSIGWAGILVSWIMNFLLKRFREETYSFIGGMILAS-LYEVLPKKMNTN 


232 


Query: 


244 


SISYSHAGILTWLMAFVLFALGIWLGLWMSQLEEK 278 








+ L + + L + LG ++ +E+K 




Sb j ct : 


233 


W ' LPSVLSLVLSLTLGFFLLYIEKK 257 





Based on this analysis, it was predicted that this GAS -specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2971 

A DNA sequence (GASx2143R) was identified in S.pyogenes <SEQ ID 8441> which encodes the amino 
acid sequence <SEQ ID 8442>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3964 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05000 GB:AP001511 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 28/78 (35%) , Positives = 37/78 (46%) 

Query: 44 EVDKVFIVPLRQLLFTDPVYYRLEVTPIETTDFPFDRIRNGKYYQFSQEYRSIPFYENLE 103 

EVD VF VP+ + p YR+ V FP +RI N YQ S + FY 

Sbjct: 127 EVDHVFTVPIDHFISHPPEQYRINVHFEPGAGFPIERIANQSAYQKSTRQITESFYYYQS 186 

Query: 104 ETIWGMTAQFTKCLTDIL 121 

IWG+TA+ + + IL 
Sbjct: 187 YVIWGLTAKILRHVITIL 204 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2972 

A DNA sequence (GASx2144R) was identified in S.pyogenes <SEQ ID 8443> which encodes the amino 
acid sequence <SEQ ID 8444>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

5 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4761 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2973 

A DNA sequence (GASx2145) was identified in S.pyogenes <SEQ ID 8445> which encodes the amino acid 
sequence <SEQ ID 8446>. Analysis of this protein sequence reveals the following: 

20 Possible site: 25 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -4.09 Transmembrane 2 - 18 ( 1 - 19) 

25 Final Results 

bacterial membrane Certainty=0. 2635 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

30 No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA49519 GB:X69895 X [Bacillus sphaericus] 
Identities = 40/97 (41%) , Positives = 57/97 (58%) , Gaps = 5/97 (5%) 

35 Query: 10 IEFLILAIVEKNDSYGYDISQTIKLVAN IKESTLYPI LKKLEKAGFLTTYSQE - HQ 64 

++ +IL ++ + D YGY+ISQ I N IKE+TLY + ++LEK + Y + 
Sbjct: 11 LDSIILRLILEKDRYGYEISQEISNRTNNSFQIKEATLYAVFQRLEKKEVIEAYYGDVSD 70 

Query: 65 GRKRKYYAVTSSGRAQLIFLKKEWQSYKFALDGIIEG 101 
40 G KRKYY +TS G+A L L KEW K +D +EG 

Sbjct: 71 GGKRKYYRITSLGKAYLSEL VKEWAE VKEI IDLFMEG 107 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

45 Example 2974 

A DNA sequence (GASx2146) was identified in S.pyogenes <SEQ ID 8447> which encodes the amino acid 
sequence <SEQ ID 8448>. Analysis of this protein sequence reveals the following: 

Possible site: 56 



50 



»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-14.75 Transmembrane 97 - 113 ( 77 - 143) 
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INTEGRAL Likelihood = -6.85 Transmembrane 116 - 132 ( 114 - 143) 

INTEGRAL Likelihood = -5.68 Transmembrane 156 - 172 ( 149 - 175) 

INTEGRAL Likelihood = -5.47 Transmembrane 79 - 95 ( 77 - 96) 



Final Results 

bacterial membrane Certainty=0 .6901 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2975 

A DNA sequence (GASx2147) was identified in S.pyogenes <SEQ ID 8449> which encodes the amino acid 
sequence <SEQ ID 845 0>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.11 Transmembrane 8 - 24 ( 6 - 30) 



Final Results 

bacterial membrane — Certainty=0. 3845 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF04457 GB:AF078161 lacunin [Manduca sexta] 
Identities = 68/310 (21%) , Positives = 117/310 (36%) , Gaps = 12/310 (3%) 

Query: 55 DIDSSASTITVETGPVQRPTVTYYTHPKLIDPIVTTVTGKTLSLSQTPKDWITGGIEIL 114 

DI+ + ++ + E+ T++ T + TTTT+ST+I + 

Sbjct: 1004 DIEGTTASGSTESTFTDETTMSKVTEESSVAEEETTKTTITEEVSGTSESASINSDKTTM 1063 

Query: 115 GFTLNNSRQEKNYRSIT- -ITVPEKTSLNEVKASNVPHTTLSNLT--VQDMQFDGNLTLL 170 

++ + IT +TV E+TS TT+S ++ + T 

Sbjct: 1064 TTLSEDTGKTSVSEEITTEMTVTEETSETSPTEGTSDKTTMSTVSEETESSSVTEETTTE 1123 

Query: 171 HTKVKKATITGMLEATKSQLTNLELKADYSFSNLTDSSVE-NGTISLGNGQLTTKDTTLK 229 

T V+ AT E T S T + ++ S +++ E T + T T+ K 

Sbjct: 1124 TTVVENATDISSTEWASDKTTMTTMSEESEKTTEEATTEITVTKEVTESSSTETATSDK 1183 

Query: 230 AVNIQSLHPGGIE-AERTTLENVTFTVSKSKEEEENDYYDNDAIFTAHALTLKGTNTITG 288 

++ S G AE +T E VT T + EE T+ +T+K T T 

Sbjct: 1184 TISTLSEETGKTSVAEESTTEKVTETTVTTMPEETGK TITSEEITIKTTVTEEP 1237 

Query: 289 GDIDVDITLTKAKAIAYRARTENGKVSLGSQLTPAKIGKESTSDVISYVAENKAATGNLT 348 

D+ +T K A E GK S+ + T E++++ S A T T 

Sbjct: 1238 TDVGSSEAITSDKTTOSTASEETGKYSVSEEETVKTTVAEASTEPSSTEAITSDKTKMST 1297 

Query: 349 VNLNKGDITI 358 

++ G ++ 
Sbjct: 1298 ISEETGKTSV 1307 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2976 

A DNA sequence (GASx2148R) was identified in S.pyogenes <SEQ ID 845 1> which encodes the amino 
acid sequence <SEQ ID 8452>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

5 

>» Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2977 

A DNA sequence (GASx2160) was identified in S.pyogenes <SEQ ID 8453> which encodes the amino acid 
sequence <SEQ ID 8454>. Analysis of this protein sequence reveals the following: 

20 Possible site: 29 

>» Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0. 1630 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

30 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2978 

A DNA sequence (GASx2170R) was identified in S.pyogenes <SEQ ID 8455> which encodes the amino 
35 acid sequence <SEQ ID 845 6>. Analysis of this protein sequence reveals the following: 

Possible site: 37 



40 



45 



>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-13.32 Transmembrane 181 - 197 ( 175 - 203) 



Final Results 

bacterial membrane Certainty=0 . 6328 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 
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Example 2979 

A DNA sequence (GASx2174) was identified in S. pyogenes <SEQ ID 845 7> which encodes the amino acid 
sequence <SEQ ID 8458>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

5 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.39 Transmembrane 3 - 19 ( 3 - 19) 

Final Results 

10 bacterial membrane Certainty=0 . 1956 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

15 Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2980 

A DNA sequence (GASx2181R) was identified in S.pyogenes <SEQ ID 8459> which encodes the amino 
acid sequence <SEQ ID 8460>. Analysis of this protein sequence reveals the following: 
20 Possible site: 24 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0 .3751 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

30 The protein has no significant homology with any sequences in the GENPEPT database. 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2981 

A DNA sequence (GASx2185R) was identified in S.pyogenes <SEQ ID 846 1> which encodes the amino 
35 acid sequence <SEQ ID 8462>. Analysis of this protein sequence reveals the following: 

Possible site: 26 



40 



45 



>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.90 Transmembrane 18 - 34 ( 18 - 34) 



Final Results 

bacterial membrane Certainty=0. 1362 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has no significant homology with any sequences in the GENPEPT database. 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2982 

A DNA sequence (GASx2186R) was identified in S.pyogenes <SEQ ID 8463> which encodes the amino 
acid sequence <SEQ ID 8464>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4803 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA78948 GB.-Z17279 transposase [Streptococcus salivarius] 
Identities = 48/77 (62%) , Positives = 55/77 (71%) , Gaps = 1/77 (1%) 

Query: 1 VSMKPIDLSK^SIRKRSKKVMKTNKKTLGKSIEERPEYINDRSEFGHWEIDLALGKKTK 60 

+ +K IDL + V IRK+ K T KK LGKSIEERPE IN+RS FG WEID LG KT 
Sbjct: 150 LEIKVIDLPRAVRIRKKFTKRPST-KKHLGKSIEERPEEINNRSRFGDWEIDSVLGGKTI 208 

Query: 61 SEAVMLTLVERQTRYAL 77 

E +LTLVERQTRYA+ 
Sbjct: 209 GEPSILTLVERQTRYAV 225 

Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2983 

A DNA sequence (GASx2187R) was identified in S.pyogenes <SEQ ID 8465> which encodes the amino 
acid sequence <SEQ ID 8466>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3287 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.agalactiae. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA78948 GB:Z17279 transposase [Streptococcus salivarius] 
Identities = 48/87 (55%) , Positives = 57/87 (65%) 

Query: 1 MNMSNINSTRKSSYSHLSATERGEIAAYLKMGKKPVEIARLLGSHRSTICREIKRGSVDQ 60 

MNMS ST SY HLS ERGEI AYL +G KP EIAR LG +RSTI REI RGS+ Q 
Sbjct: 1 MNMSTNYSTTNQSYKHLSEAERGEIEAYLSVGLKPAEIARRLGRNRSTITREINRGSITQ 60 



Query: 61 VKDKNGKQTFFNAYFADSRQRVYETNR 87 

VK NG++ ++ Y+AD+ Y R 
Sbjct: 61 VKKVNGQKVYYQHYYADAAHNRYRHAR 87 
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Based on this analysis, it was predicted that this GAS-specific protein and its epitopes, could be useful 
antigens for vaccines or diagnostics. 

Example 2984 

A DNA sequence <SEQ ID 9013> was identified in S.agalactiae which encodes amino acid sequence 
<SEQ ID 9014>. Analysis of the amino acid sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
McG: Discrim Score: 10.50 
GvH: Signal Score (-7.5): -5.2 

Possible site: 40 
>» Seems to have an uncleavable N-term signal seq 



)M program 


count: 4 value: 


-12. 


.26 threshold: 0 


.0 








INTEGRAL 


Likelihood =-12. 


,26 


Transmembrane 


98 - 


114 ( 


94 - 


116) 


INTEGRAL 


Likelihood = -8. 


.17 


Transmembrane 


5 - 


21 ( 


1 - 


27) 


INTEGRAL 


Likelihood = -6 


.95 


Transmembrane 


62 - 


78 ( 


57 - 


80) 


INTEGRAL 


Likelihood = -5. 


.84 


Transmembrane 


37 - 


53 ( 


30 - 


55) 


PERIPHERAL 


Likelihood = 17. 


.35 


81 











modified ALOM score: 2.95 



*** Reasoning Step: 3 

Final Results 

bacterial membrane 

bacterial outside 

bacterial cytoplasm 



Certainty=0 . 5904 (Affirmative) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORF01345(292 - 636 of 951) 

PIR|G64646|G64646(56 - 168 of 205) hypothetical protein HP1015 - Helicobacter pylori 
(strain 26695) 
%Match =4.4 

%Identity =30.6 %Similarity =54.1 

Matches = 34 Mismatches = 46 Conservative Sub.s = 26 



87 117 147 177 207 237 267 297 

LSGMGATFVPQTLIHRYLDKECNVYHFHKNKLFSEYIMIYKKDVELSGIALLLYKAFLTK*FR*FY*KSVYFLPKSV*NR 

I 

RYFLQNIIHIHQNKELQFIKKCLLGYFFAPLCGAILLVLFIVSSGAKSFQISNLFNN 
10 20 30 40 50 

327 357 381 411 441 471 501 

PMIYKI IASLFLVLI PI FSQVL- - VKI FKLKKFNIMFPDVAFPIFVFLI PLI SSSLLKQNLLPYYLILISLLAIGITI - - 
: | :: |||| :::::[:: | : || : :: | | |: | :||: |::| 

QLAYIVLLSLFLCALGFIAGAIGFYRLSKITRHLSFFENFAFSFLAVILCAILSYLV PNASNALSLIGNGVSIFY 

70 80 90 100 110 120 130 

549 579 606 636 666 696 726 756 

--KLLRTKTLFSYKRFLKLFWRSGF-ILTFLFYLGLLVIIFIKVQ*KELDKLNCTPKVRQKI*RLGCFSDEIKD*R*TRN 

II I :|== =11 = III =1 I I I II 1= 

LHKLYRELSLYTQERF FLSGFRLLLFSFMLALLGILVQALVIIFLTTAWLMCVALGFLARAFLNFSQVFLKA 

150 160 170 180 190 200 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 2985 

A DNA sequence <SEQ ID 9015> was identified in S.agalactiae which encodes amino acid sequence 
<SEQ ID 9016>. Analysis of the amino acid sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
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McG: Discrim Score: 13.20 
GvH: Signal Score (-7.5): -2.08 

Possible site: 34 
>>> Seems to have a cleavable N-term signal seq. 
5 ALOM program count: 0 value: 10.45 threshold: 0.0 

PERIPHERAL Likelihood =10.45 36 
modified ALOM score: -2.59 



10 



*** Reasoning Step: 3 



15 



35 



Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

43.9/72.0% over 56aa 

Streptococcus 

pneumoniae 

20 EGAD|7626| epua protein Insert characterized 

SP|Q03159|EPUA_STRPN EPUA PROTEIN. Insert characterized 
GP|47373|emb|CAA38133.l| |X54225 7 kDa protein Insert characterized 
PIR|S10640|S10640 epuA protein - Insert characterized 

25 ORF01809(331 - 501 of 801) 

EGAD | 7626 | 7426 (8 - 64 of 64) epua protein {streptococcus pneumoniae }SP|Q03 15 9 ]EPUA_STRPN 
EPUA PROTEIN. GP | 47373 | emb | CAA38133 . 1 ] ] X54225 7 kDa protein {streptococcus 
pneumoniae }PIR | S10640 | S10640 epuA protein - Streptococcus pneumoniae 
%Match =10.0 

30 %Identity =43.9 %Similarity =71.9 

Matches = 25 Mismatches = 16 Conservative Sub.s = 16 



171 201 231 261 291 321 351 381 

RSCLLTYELVQL*SWQEWIiRKGKQ*LAN*PI*TWIINSMKN*RLLVLIL 

||:|:: |:::|::| 
MKMNKKSSYWKRLLLVI I VLILG 
10 20 



411 441 471 501 531 561 591 621 

40 LLFLAVGLMLGYSVFGDGEHAYSILSLDKWQNIIGKFLGK*KEPL*VI*CL*WFPLRVNFSSRIIQ*QKNKNK*QLRL*L 

I I =111 = 11 = = l 1= = = 111 III =1 I I I 
TLALGIGLMVGYGILGKGQDPWAILSPAKWQELIHKFTGN 
40 50 60 

45 A related DNA sequence <SEQ ID 10507> was identified in GBS which encodes amino acid sequence 
<SEQ ID 10508>. 

SEQ ID 9016 (GBS 168) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 33 (lane 9; MW 7.6kDa) and in Figure 34 (lane 5; MW 7.6kDa). It was also 
expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 42 
50 (lane 2; MW 32.6kDa). 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vacc 



Example 2986 

A DNA sequence <SEQ ID 9017> was identified in S.agalactiae which encodes amino acid sequence 
55 <SEQ ID 9018>. Analysis of the amino acid sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -2.85 
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GvH: Signal Score (-7.5): -5.7 

Possible site: 21 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 0 value: 5.25 threshold: 0.0 
5 PERIPHERAL Likelihood = 5.25 103 

modified ALOM score: -1.55 

*** Reasoning Step: 3 

10 Final Results 

bacterial cytoplasm Certainty=0 . 1210 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the databases: 

56.1/72.0% over 131aa 

Escherichia coli 

EGAD | 40237 | arsenate reductase Insert characterized 

SP|P52147|ARC2_EC0LI ARSENATE REDUCTASE (ARSENICAL PUMP MODIFIER). Edit characterized 
20 GP| 1061418|gb|AAB09628.l| |U38947 ArsC {Plasmid R46} Insert characterized 

ORF00095(304 - 699 of 1008) 

EGAD | 40237 | 42398 (1 - 132 of 141) arsenate reductase {Escherichia coli} SP | P52147 | ARC2_ECOLI 
ARSENATE REDUCTASE (ARSENICAL PUMP MODIFIER). GP 1 1061418 | gb | AAB09628 . 1 1 | U38947 ArsC 
25 {Plasmid R46} 

%Match =22.0 

%Identity =56.1 %Similarity =72.0 

Matches = 74 Mismatches = 37 Conservative Sub.s = 21 

30 129 159 189 219 249 279 309 339 

RIHSSLSL*PIFHRKRPYPSRAFTyiYFSNSCG*LWC*YCDDWRELLAGLGINFYFLKTLVALKIERKMMEKIRIYHNPNC 



35 



MSNITIYHNPHC 
10 

369 399 429 459 489 519 549 579 

GTSRNVLAIIRHCGIEPEIIYYLKTPPSRMELVELLLEMKLSARELLRTDVPAYEKFNLESSSVTDEEMIDAMIQDPILI 



GTSRNTLEMIRNSGIEPTVILYLETPPSRDELLKLIADMGISVRALLRKNVEPYEELGLAEDKFTDDQLIDFMLQHPILI 
40 30 40 50 60 70 80 90 

609 639 669 699 729 759 789 819 

NRPIVOTSKGAKLCRPCEAILTILPVKMEKDFVKEDGQIIQSL*HIV**IMV*EVSK*HY*KKLMRLETFCKQKASQHQN 

II II III I I I II I I =1 III : I llll: : 
45 NRPI WTPLGTKLCRPSEWLDILPDAQKAAFTKEDGEKWDDSGKRLK 

110 120 130 140 

SEQ ID 9018 (GBS45) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 8 (lane 4; MW 18.6kDa). 

50 The GBS45-His fusion product was purified (Figure 97A; see also Figure 191, lane 5) and used to immunise 
mice (lane 1 product; 20|ag/mouse). The resulting antiserum was used for Western blot (Figure 97B), FACS 
(Figure 97C), and in the in vivo passive protection assay (Table III). These tests confirm that the protein is 
immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
55 antigens for vaccines and/or diagnostics. 

Example 2987 

A DNA sequence <SEQ ID 9019> was identified in S.agalactiae which encodes amino acid sequence 
<SEQ ID 9020>. Analysis of the amino acid sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 3 
McG: Discrim Score: 6.84 
GvH: Signal Score (-7.5): 2.98 
Possible site: 25 
5 >» Seems to have a cleavable N-term signal seq. 

ALOM program count: 0 value: 13.69 threshold: 0.0 
PERIPHERAL Likelihood = 13.69 77 
modified ALOM score: -3.24 

10 *** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) 

A DNA sequence <SEQ ID 10337> was identified in GBS which encodes amino acid sequence <SEQ ID 
10338>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

SEQ ID 9020 (GBS55) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
20 extract is shown in Figure 17 (lane 7; MW 11.3kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 33 (lane 5; MW 36.3kDa). 

GBS55-GST was purified as shown in Figure 197, lane 5. 

GBS671 was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown 
in Figure 161 (lane 2-4; MW 12kDa) and in Figure 188 (lane 2; MW 12kDa). Purified protein is shown in 
25 Figure 242, lane 3. 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 2988 

A DNA sequence <SEQ ID 902 1> was identified in S.agalactiae which encodes amino acid sequence 
30 <SEQ ID 9022>. Analysis of the amino acid sequence reveals the following: 

Lipop Possible site: -1 Crend: 3 
McG: Discrim Score: -14.35 
GvH: Signal Score (-7.5) : -2.12 
Possible site: 44 
35 >» Seems to have no N-terminal signal sequence 

ALOM program count: 4 value: -13.90 threshold: 0.0 

INTEGRAL Likelihood =-13.90 Transmembrane 101 - 117 ( 92 - 126) 
INTEGRAL Likelihood = -7.64 Transmembrane 130 - 146 ( 125 - 148) 
INTEGRAL Likelihood = -6.64 Transmembrane 24 - 40 ( 20 - 45) 
40 INTEGRAL Likelihood = -2.44 Transmembrane 55 - 71 ( 55 - 75) 

PERIPHERAL Likelihood = 17.40 2 
modified ALOM score: 3.28 



45 



50 



*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 6562 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 9022 (GBS215) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 175 (lane 10; MW 45kDa). 
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Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 2989 

A DNA sequence <SEQ ID 9023> was identified in S.agalactiae which encodes amino acid sequence 
5 <SEQ ID 9024>. Analysis of the amino acid sequence reveals the following: 

Lipop Possible site: -1 Crend: 0 
McG: Discrim Score: 11.66 
GvH: Signal Score (-7.5): -5.3 
Possible site: 61 
10 >>> Seems to have an uncleavable N-term signal seg 

ALOM program count: 2 value: -14.12 threshold: 0.0 

INTEGRAL Likelihood =-14.12 Transmembrane 13 - 29 ( 5 - 35) 
INTEGRAL Likelihood = -8.17 Transmembrane 44 - 60 ( 39 - 65) 
PERIPHERAL Likelihood =39.00 29 
15 modified ALOM score: 3.32 

*** Reasoning Step: 3 

Final Results 

20 bacterial membrane Certainty=0 . 6647 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 9024 (GBS217) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
25 extract is shown in Figure 85 (lane 2; MW 36.1kDa) and in Figure 156 (lane 1 & 3; MW 36kDa). 

GBS217-GST was purified as shown in Figure 224, lane 5-6. 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 2990 

30 A DNA sequence <SEQ ID 9025> was identified in S.agalactiae which encodes amino acid sequence 
<SEQ ID 9026>. Analysis of the amino acid sequence reveals the following: 

Lipop Possible site: -1 Crend: 10 
McG: Discrim Score: 8.20 
GvH: Signal Score (-7.5): -3.7 
35 Possible site: 33 

>» Seems to have an uncleavable N-term signal seg 



)M program 


count: 4 value: 


-9. 


,98 threshold: 


0.0 










INTEGRAL 


Likelihood 


= -9. 


,98 


Transmembrane 


22 - 


38 


( 


12 - 


43) 


INTEGRAL 


Likelihood 


= -7. 


,80 


Transmembrane 


61 - 


77 


( 


56 - 


85) 


INTEGRAL 


Likelihood 


= -5. 


,20 


Transmembrane 


121 - 


137 


( 


117 - 


148) 


INTEGRAL 


Likelihood 


= -2. 


.97 


Transmembrane 


99 - 


115 


( 


98 - 


119) 


PERIPHERAL 


Likelihood 


= 10. 


,77 


5 













40 

modified ALOM score: 2.50 

45 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 .4991 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



A related DNA sequence <SEQ ID 10701> was identified in GBS which encodes amino acid sequence 
<SEQ ID 10702>. 
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Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 2991 

A DNA sequence <SEQ ID 9027> was identified in S.agalactiae which encodes amino acid sequence 
5 <SEQ ID 9028>. Analysis of the amino acid sequence reveals the following: 

Lipop Possible site: -1 Crend: 7 
McG: Discrim Score: 10.61 
GvH: Signal Score (-7.5): -4.21 
Possible site: 51 
10 >>> Seems to have an uncleavable N-term signal seg 

ALOM program count: 3 value: -10.99 threshold: 0.0 

INTEGRAL Likelihood =-10.99 Transmembrane 38 - 54 ( 33 - 61) 
INTEGRAL Likelihood = -8.01 Transmembrane 5 - 21 ( 1 - 26) 
INTEGRAL Likelihood = -7.01 Transmembrane 65 - 81 ( 60 - 87) 
15 PERIPHERAL Likelihood = 13.85 99 

modified ALOM score: 2.70 

*** Reasoning Step: 3 

20 Final Results 

bacterial membrane Certainty=0 . 5394 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
25 antigens for vaccines and/or diagnostics. 

Example 2992 

A DNA sequence <SEQ ID 9029> was identified in S.agalactiae which encodes amino acid sequence 
<SEQ ID 9030>. Analysis of the amino acid sequence reveals the following: 

Lipop Possible site: -1 Crend: 10 
30 McG: Discrim Score: -21.39 

GvH: Signal Score (-7.5): -1.85 

Possible site: 57 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -8.44 threshold: 0.0 
35 INTEGRAL Likelihood = -8.44 Transmembrane 38 - 54 ( 36 - 59) 

PERIPHERAL Likelihood = 19.10 18 
modified ALOM score: 2.19 



40 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 .4376 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 2993 

A DNA sequence <SEQ ID 903 1> was identified in S.agalactiae which encodes amino acid sequence 
<SEQ ID 9032>. Analysis of the amino acid sequence reveals the following: 

50 Lipop Possible site: -1 Crend: 5 

McG: Discrim Score: 12.87 
GvH: Signal Score (-7.5): -3.57 
Possible site: 41 
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>>> Seems to have an uncleavable N-term signal seq 

ALOM program count: 4 value: -10.30 threshold: 0.0 

INTEGRAL Likelihood =-10.30 Transmembrane 69 - 85 ( 63 - 98) 
INTEGRAL Likelihood = -8.65 Transmembrane 4 - 20 ( 1-29) 
INTEGRAL Likelihood = -2.07 Transmembrane 96 - 112 ( 95 - 118) 
PERIPHERAL Likelihood = 9.71 113 
modified ALOM score: 2.56 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 5118 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

20.1/50.5% over 114aa 

Streptococcus pneumoniae 

GP| 9798572 ] BlpX protein Insert characterized 
ORF02100(316 - 660 of 999) 

GP|9798572|emb|CAC03527.l| |AJ276410(9 - 123 of 132) BlpX protein {streptococcus pneumoniae} 
%Match =5.0 

%Identity =20.0 %Similarity =50.4 

Matches = 23 Mismatches = 57 Conservative Sub.s = 35 

90 120 150 180 210 240 270 300 

LMSLF*DPQVSGEELDKFTVRLDSHRKSNSRG*NQLVI ILRLYSQIN*REPNMLVGPFLNKGEHMTQDYI CYL* SRGGED 

MEV 

330 360 390 420 450 480 510 540 

MHNILRFLGIVIISAVILFSIGSFYDLTLMKNILLICWSFLFDLLVFVFKjQRQTTEVLTWYQVVKQFWLFIKCTILIPIL 
: |,: :: ;|:|: :|:| :::: |: :| ||, | | : : : : : :| : |: || 

FNMKYRLFFVIFLSSVLDILLGTFLQISIVSIGWLvLYSGLFEAGWLLANKGVAVKIKEVDIRNRFKFIFGKTLWFQIL 
20 30 40 50 60 70 80 

570 600 630 660 690 720 750 780 

VAFIIMKGCLTSISDIL1YFYLHLVWYYTIGMILSLGRIISPEHSMFNKLRK*NELYLKFVFNRADLTICCLPCLS*FF 
: :: : || || |: ,|: : :|| :: :: : : 

LLIFLIIKLYLGLDARLILFYGHIFIVFNALMYLLSSSQVSLKKNKLSS 
100 110 120 130 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 2994 

A DNA sequence <SEQ ID 9033> was identified in S.agalactiae which encodes amino acid sequence 
<SEQ ID 9034>. Analysis of the amino acid sequence reveals the following: 

Lipop Possible site: -1 Crend: 9 

McG: Discrim Score: 3.25 

GvH: Signal Score (-7.5): -3.39 
Possible site: 59 

>>> Seems to have an uncleavable N-term signal seq 

ALOM program count: 4 value: -6.64 threshold: 0.0 

INTEGRAL Likelihood = -6.64 Transmembrane 46 - 62 ( 43 - 64) 
INTEGRAL Likelihood = -5.15 Transmembrane 17 - 33 ( 15 - 34) 
PERIPHERAL Likelihood = 11.03 100 
modified ALOM score: 1.83 

*** Reasoning Step: 3 



Final Results 
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bacterial membrane Certainty=0. 3654 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the databases: 

35.5/63.8% over 127aa 

OMNI |NT01BS4455 | wall teichoic acid glycosylation protein GtcA Insert characterized 

10 ORF01715 (343 - 750 of 1053) 

OMNI |NT01BS4455 (58 - 185 of 187) wall teichoic acid glycosylation protein GtcA 
%Match =8.0 

%Identity =35.5 %Similarity = 63.7 

Matches = 44 Mismatches = 39 Conservative Sub.s = 35 

15 

210 240 270 300 330 360 390 411 

GN*ASRW*NNLLSISQTKSKAKLMGDFLITLKHP* YNKNMVKLKSLLKKSIQNEVSLYLLFGLLTSLLYLV- - - IRQGI 
: : : : :| | : :: |: :|:: |::|::: : | | 

PRRNHQTIICIGPASHLPQLFRRTLGIFyFRQRAREAKNFEKFFRKRGTSVKYREIIMYIIMGVFTTIVNIASFYILVEI 
20 20 30 40 50 60 70 80 

441 471 501 531 549 579 609 639 

FNFSQDAPFSAI VANIIAILFAFFTNDRFVFKQTKIEQLQRL QTFVIARLGTLGLDLILAVIFVDQFPSIIGQFVQ 

I I = = I h = :|||: II :||:| 1111 11= =11 = 11 = =hl II 

25 MNVDYKA--ATVAAWILSVLFAYITNKLYVFQQ-KTHDLQSLLKELTAFFSVRVLSLGIDLGMMIILVGQF 

100 110 120 130 140 150 

669 690 720 750 780 810 840 870 

HNIiNKINTIESL VSQILIILLNYILSKFVIFKDKKRQL*QELSFLiIFLLWIFG*ET*YLHALIQFFLSQFLERWHSV 

30 || |:| : :|:::||: ||:::|| 1 : 

NTNETLAKILDNAVIVvWYVASKWLVFKKTKEEGV 

160 170 180 

SEQ ID 9034 (GBS283) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
35 extract is shown in Figure 63 (lane 8; MW 67.6kDa). 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 2995 

A DNA sequence <SEQ ID 9035> was identified in S.agalactiae which encodes amino acid sequence 
40 <SEQ ID 9036>. Analysis of the amino acid sequence reveals the following: 

Lipop Possible site: -1 Crend: 2 
SRCFLG: 0 

McG: Length of OR: 22 

Peak Value of UR: 3.86 
45 Net Charge of CR: 2 

McG: Discrim Score: 16.84 
GvH: Signal Score (-7.5): -4.38 

Possible site: 21 
>» Seems to have an uncleavable N-term signal seq 
50 Amino Acid Composition: calculated from 1 

ALOM program count: 1 value: -12.37 threshold: 0.0 

INTEGRAL Likelihood =-12.37 Transmembrane 7 - 23 ( 1-26) 
PERIPHERAL Likelihood = 12.84 64 
modified ALOM score: 2.97 
55 icml HYPID: 7 CFP: 0.595 

*** Reasoning Step: 3 



60 



Final Results 

bacterial membrane Certainty=0. 5946 (Affirmative) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 9036 (GBS286) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
5 extract is shown in Figure 52 (lane 11; MW 16.4kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 59 (lane 2; MW 41.3kDa) and in Figure 
63 (lane9;MW41.4kDa). 

The GBS286-GST fusion product was purified (Figure 210, lane 9; Figure 225, lane 9) and used to 
immunise mice. The resulting antiserum was used for FACS (Figure 274), which confirmed that the protein 
10 is immunoaccessible on GBS bacteria. 

GBS668 was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 139 (lane 2-4; MW 43.5kDa) and in Figure 187 (lane 6; MW 43kDa). It was also 
expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 139 
(lane 6 & 7; MW 18.6kDa) and in Figure 179 (lane 12; MW 19kDa). 

15 GBS668-GST was purified as shown in Figure 237 (lane 10). GBS668-His was purified as shown in Figure 
231 (lanes 5 & 6). 

GBS673 was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown 
in Figure 161 (lane 8-10; MW 17kDa) and in Figure 188 (lane 4; MW 17kDa). It was also expressed in 
E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 162 (lane 8; 
20 MW 41.5kDa) and in Figure 239 (lane 7; MW 41kDa). Purified GBS673-His is shown in Figure 242, lane 
5. Purified GBS673-GST is shown in Figure 246, lane 2. 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 2996 

25 A DNA sequence <SEQ ID 9037> was identified in S.agalactiae which encodes amino acid sequence 
<SEQ ID 9038>. Analysis of the amino acid sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: -18.42 
GvH: Signal Score (-7.5): -6.16 
30 Possible site: 57 

>>> Seems to have no N- terminal signal sequence 

ALOM program count: 2 value: -8.49 threshold: 0.0 

INTEGRAL Likelihood = -8.49 Transmembrane 51 - 67 ( 44 - 95) 
INTEGRAL Likelihood = -3.08 Transmembrane 70 - 86 ( 68 - 95) 
35 PERIPHERAL Likelihood = 12.89 32 

modified ALOM score: 2.20 

*** Reasoning Step: 3 

40 Pinal Results 

bacterial membrane Certainty=0 .4397 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 SEQ ID 9038 (GBS386) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 70 (lane 2; MW 14kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 72 (lane 8; MW 39.5kDa). 
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GBS386-GST was purified as shown in Figure 213, lane 8. 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 2997 

5 A DNA sequence <SEQ ID 9039> was identified in S.agalactiae which encodes amino acid sequence 
<SEQ ID 9040>. Analysis of the amino acid sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -15.47 
GvH: Signal Score (-7.5): -6.21 
10 Possible site: 14 

»> Seems to have no N-terminal signal sequence 

ALOM program count: 2 value: -3.61 threshold: 0.0 

INTEGRAL Likelihood = -3.61 Transmembrane 94 - 110 ( 94 - 111) 
INTEGRAL Likelihood = -1.70 Transmembrane 75- 91 ( 75- 91) 
15 PERIPHERAL Likelihood = 5.94 139 

modified ALOM score: 1.22 

*** Reasoning Step: 3 

20 Final Results 

bacterial membrane Certainty=0 . 2444 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the databases: 

ORF0148K394 - 720 of 1065) 

GP| 9657521 |gb|AAF96047.1 | |AE004354 (16 - 121 of 243) uridine phosphorylase {Vibrio cholerae} 
%Match =5.3 

*Identity =28.0 %Similarity =48.6 
30 Matches = 30 Mismatches = 54 Conservative Sub.s = 22 

150 180 210 240 270 300 330 360 

V*KHMV*AI*YGNLP*KW*IVPLSIFIFANLTLPFKFH*VKIEKIFLTR**NIVN*GLKEMLMIINSFDNSRKAIINPED 

35 MSIQ 

390 420 450 480 510 540 570 600 

INSPIKGFPKTVITCFARETFNRILEELPHREIARTSVANLEIPiyELEFKGQKIGFFNAYVGASACVAILEDIIVFGME 

hi III I : I I I :: II: I I = =11 = = =l== I = 

40 PHIHVAQVAPRVWCGEPNRANRIASLLNNAE LVAENREYRLFSGEFEEQPITVCSTGIGAPSMI IAVEELARSGAK 

20 30 40 50 60 70 80 

630 660 690 720 750 780 810 840 

SLIVFGTCGVLDSSIEETSIIIPRSAIRDEGTSFHYSEASSEIAvNTNSIFLLCG*FRCRSMGSKIFRK*RGFRKER*NC 
45 ::: |: I : I I :|: |:|||| | | |: 

AIVRVGSAGAMQSEIGLGELILVEGAVRDEGGSKAYIGAAYPAYSSFELWEMQRFLAEQSVPIHRGIVRSHDSFYTDEE 
100 110 120 130 140 150 160 

SEQ ID 9040 (GBS388) was expressed in E.coli as a EKs-fusion product. SDS-PAGE analysis of total cell 
50 extract is shown in Figure 70 (lane 3; MW 21kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 72 (lane 9; MW 45.6kDa). 

The GBS388-GST fusion product was purified (Figure 213, lane 10) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 311), which confirmed that the protein is irnmunoaccessible 
on GBS bacteria. 
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Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 2998 

A DNA sequence <SEQ ID 9041> was identified in S.agalactiae which encodes amino acid sequence 
5 <SEQ ID 9042>. Analysis of the amino acid sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: -11.81 
GvH: Signal Score (-7.5): -7.49 
Possible site: 25 
10 >>> Seems to have no N- terminal signal sequence 

ALOM program count: 1 value: -5.68 threshold: 0.0 

INTEGRAL Likelihood = -5.68 Transmembrane 78 - 94 ( 77 - 95) 
PERIPHERAL Likelihood = 4.61 134 
modified ALOM score: 1.64 

15 

*** Reasoning Step: 3 



20 



Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Certainty=0 . 3272 (Affirmative) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORF01912(307 - 720 of 1056) 
25 GP|3845252|gb|AAC71927.l] )AE001412(81 - 242 of 244) hypothetical protein {Plasmodium 

falciparum} PIR|D71608 |D71608 hypothetical protein PFB0690w - malaria parasite (Plasmodium 
falciparum) 
%Match =4.0 

%Identity =31.2 %Similarity =53.5 
30 Matches = 45 Mismatches = 58 Conservative Sub.s = 32 

231 261 291 348 378 405 

KKGRFLIDLYCISn7MNFKNSKIA*NQCFDV**RVvNHLL^ 

|| I : I :: : |: I I : =11 I I II I 
3 5 ONELQSLLSKEEEKYDFVKNELGDLQKQKDLLKWHLCNNI KKLSMKRSDYKFKTETKSKLESKLKSLKDMNKIHKFEHD 

60 70 80 90 100 110 120 

450 480 501 531 558 588 618 

ELDSKGWSKKDSRTIKILYDGLINK HIVSLDRADYNII-QVIPFANVHVLLFLIPERENSKNYRIY 



TLEELVHKMEQELETKMYIKND IENIFNECINKKDEYLKDITQERISVFKERKKRQNQLQKLLLIMKQENNKNYNIN 

140 150 160 170 180 190 200 

648 672 693 720 750 780 810 840 

45 NYSDYEMELINE- -DRQQFSKYET- - - VDL-DQLILVDIFNIDDYISSYLTI*DIENLDLGLLKLINYADNKSDRHILQT 

II |:|| = = =11 =11 I I I:: 
YLKKYESNLMNEINSYKNYKDFETKIAMDLIDDHSLNDLYVT 
220 230 240 

50 A related DNA sequence <SEQ ID 10589> was identified in GBS which encodes amino acid sequence 
<SEQ ID 10590>. 

SEQ ID 9042 (GBS408) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 76 (lane 6; MW 20.4kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 171 (lane 5; MW 45.3kDa). 

55 GBS408-GST was purified as shown in Figure 218, lane 9. 
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Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 2999 

A DNA sequence <SEQ ID 9043> was identified in S.agalactiae which encodes amino acid sequence 
5 <SEQ ID 9044>. Analysis of the amino acid sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: -9.62 
GvH: Signal Score (-7.5): -4.84 
Possible site: 61 
10 >» Seems to have no N- terminal signal sequence 

ALOM program count: 2 value: -11.09 threshold: 0.0 

INTEGRAL Likelihood =-11.09 Transmembrane 45 - 61 ( 37 - 72) 
INTEGRAL Likelihood = -8.60 Transmembrane 76 - 92 ( 70 - 97) 
PERIPHERAL Likelihood = 11.62 95 
15 modified ALOM score: 2.72 

*** Reasoning Step: 3 

Final Results 

20 bacterial membrane --- Certainty=0 . 5437 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

25 ORF01977(442 - 627 of 948) 

EGAD| 88220 | 96064 (204 - 583 of 751) hypothetical 848 kDa protein f23fl25 in chromosome iii 
{Caenorhabditis elegans} SP| P4650l|YLX5_CAEEL HYPOTHETICAL 84.8 KDA PROTEIN F23F12.5 IN 
CHROMOSOME III. GP | 529214 | gb |AAA20607 . 1 | |U12965 F23F12.5 gene product {Caenorhabditis 
elegans } 

30 %Match =4.6 

%Identity =35.9 %Similarity = 59.4 

Matches = 23 Mismatches = 24 Conservative Sub.s = 15 

132 222 252 282 312 342 372 402 

35 DFVSSFFIS*SQTNYNRISFLLKLAKHQLECLNNVAQGLSV**YSSMKDYINRILHFIKEHMTYHWF 

VTLSAYFPFTITVERYYAMNKSEKYEKMPIILGPLFVLFIVKLELKIKDKVTLFQVIvNFGVIFQIYKNETFSHGDVAFS 
120 130 140 150 160 170 180 190 

40 432 462 492 522 552 

SNIHLRFWTTIIAYLVIFILSISTVILNLVLLFQGFLTQNPIIYLLFFITLVCAFY 



45 



LYPPGTAEKVFTFYvVLFLINLLDVMFNLvLLQMSFLNTNRFHWLCFFLWQFALFFCCQQIFSIFYNFSPGLSCDD 

200 210 220 230 240 250 260 

600 627 657 687 717 747 

FAYKFITYTPTIVKNAL-QYIKKLIOSIV*NNKVICTLTLYQLCFRVFFHTKITKKDSYLTI 



AGNFYLSQFVSGAVTAFAKIFVFLLDTYVPSFDRRRLHQYPQIAMILCYCVIMVLMILPESDCGSQGSRDLAIIIINIIG 
50 560 570 580 590 600 610 620 

SEQ ID 9044 (GBS411) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 78 (lane 2; MW 16kDa). 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
55 antigens for vaccines and/or diagnostics. 
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Example 3000 

A DNA sequence <SEQ ID 9045> was identified in S.agalactiae which encodes amino acid sequence 
<SEQ ID 9046>. Analysis of the amino acid sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
5 McG: Discrim Score: -17.94 

GvH: Signal Score (-7.5): -4.63 

Possible site: 45 
>>> Seems to have no N- terminal signal sequence 
ALOM program count: 1 value: -6.10 threshold: 0.0 
10 INTEGRAL Likelihood = -6.10 Transmembrane 31 - 47 ( 26 - 49) 

PERIPHERAL Likelihood =15.33 3 
modified ALOM score: 1.72 



15 



20 



35 



*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 3442 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



ORF01982(313 - 501 of 801) 

Gp|2444082|gb|AAC79518.l| |U88974(93 - 156 of 156) ORF2 {streptococcus thermophilus 
temperate bacteriophage 01205} PIR|T13290|T13290 hypothetical protein 2 - Streptococcus 
25 phage phi -012 05 

%Match =11.5 

%Identity =48.4 %Similarity = 59.4 

Matches =31 Mismatches = 25 Conservative Sub.s = 7 

30 174 204 234 264 294 324 354 384 

DVDQNIESHKLFKRHFV*RAILPQSKRK*EN**LCVISEPR*KLKSKLGELKMGFFAQRCPYCQSTKVQFMNQDRKGFNG 

I :|IH:]I Ml I II 1 = 
LLMFVGVALLFARLFWEIKHPMTKEQKEQLKIERAKAKEEFRKSRNEFKKAMAEARAVKCPYCKSTDVEFMVQQRKSFSI 
50 60 70 80 90 100 110 



414 441 471 501 531 561 591 621 

CVGCIGFLIAWPF-LLLGLVGKKGKNNWHCTNCGRTFKTK*KSPTLKFCPRRA*GKF*YSKNLLFGRGFYHTYFNRK*GI 



GKAAAGTIMTGGVGALAGFAGKKGKKEWHCKNCGAVFTTK 
40 130 140 150 

SEQ ID 9046 (GBS412) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 171 (lane 6; MW 36kDa). Purified GBS412-GST is shown in Figure 218, lane 
10-11. 

45 Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 3001 

A DNA sequence <SEQ ID 9047> was identified in S.agalactiae which encodes amino acid sequence 
<SEQ ID 9048>. Analysis of the amino acid sequence reveals the following: 

50 Lipop: Possible site: -1 Crend: 0 

McG: Discrim Score: 3.67 
GvH: Signal Score (-7.5): -3.62 

Possible site: 41 
»> Seems to have an uncleavable N-term signal seq 
55 ALOM program count: 5 value: -7.27 threshold: 0.0 

INTEGRAL Likelihood = -7.27 Transmembrane 48 - 64 ( 32 - 68) 
INTEGRAL Likelihood = -6.26 Transmembrane 87 - 103 ( 85 - 105) 
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INTEGRAL Likelihood 
INTEGRAL Likelihood 
INTEGRAL Likelihood 
PERIPHERAL Likelihood 
modified ALOM score: 1. 



-6.21 Transmembrane 

-3.29 Transmembrane 

-2 . 87 Transmembrane 
4.24 66 



29 - 45 ( 26 - 46) 
110 - 126 ( 109 - 130) 
2 - 18 ( 1-18) 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 3909 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01286(304 - 672 of 993) 

GP]8272442|dbj |BAA96471.l| |AB036428(90 - 212 of 218) type IV prepilin peptidase homologue 
{streptococcus mutans} 
%Match =16.8 

%Identity =46.3 %Similarity = 72.4 

Matches = 57 Mismatches = 34 Conservative Sub.s = 32 



102 132 162 192 222 252 282 312 

*RRLPI*T*IPNFFKRFCTSNKTFIYEF*QKTIQFSRKSATAC*LSL*R*TDYL**KS*SLFYHFSNININYKKDFMIMS 

LGSFFGLWDRYPQKSIIFPRSHCNKCYNCLTMRDLIPIFSRIINKNSCRFCGYPIPLRYSLVELLCGLISTGFALDLLT 
30 40 50 60 70 80 90 



342 372 402 432 462 492 522 552 

TIYFISLCTSFILSYYDIKYQEYPIFLWILFTISTIILTPITKVSIVLCLFGILREVVDINIGSGDFLYLATIGLSLPLH 

I I I =11 l|:= I 11= III II = = |= = l = = l lllhl : =11111111 = 1111= III 1 = 

TSQVCLLFMGVLLSLYDLQDQSYPLTLWIGFTFLLMFIYPLNLISLILFLFGIFAALKNINIGSGDFFYLATLALSLNLQ 
110 120 130 140 150 160 170 

582 612 642 672 702 732 762 792 

QMLFIIQIGAWLGIIYCLWIRKMKKTIAFLPFLSIAYIIOTSYSLLF*SL*IIRKVIKLWVLVAFWIFRMTNCTTKINHG 

l===llll = 111=1 1= =1 1= 11=111 = ==l= 
QI IWI IQIASLLGILYSLLFQKHKEPFAFVPFLFLGHLI 1 1 FSHLI 
190 200 210 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 3002 

A DNA sequence <SEQ ID 9049> was identified in S.agalactiae which encodes amino acid sequence 
<SEQ ID 9050>. Analysis of the amino acid sequence reveals the following: 

Lipop: Possible site: -1 Crend: 2 
McG: Discrim Score: 10.43 
GvH: Signal Score (-7.5): -4.39 



Possible site: 54 
>» Seems to have an uncleavable N-term signal seq 



)M program 


count: 4 value: 


-10 


.30 threshold: 


0.0 








INTEGRAL 


Likelihood =-10. 


.30 


Transmembrane 


62 


- 78 


( 59 


- 84) 


INTEGRAL 


Likelihood = -6. 


.10 


Transmembrane 


4 


- 20 


( 1 


- 22) 


INTEGRAL 


Likelihood = -4 


.25 


Transmembrane 


128 


- 144 


( 123 


- 145) 


INTEGRAL 


Likelihood = -3 . 


.13 


Transmembrane 


88 


- 104 


( 87 


- 104) 


PERIPHERAL 


Likelihood = 2 


,01 


109 











modified ALOM score: 2.56 



*** Reasoning Step: 3 



Final Results 

bacterial membrane - 
bacterial outside - 
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bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

SP|Q48729|LSPA_LACLC(1 - 149 of 150) LIPOPROTEIN SIGNAL PEPTIDASE (EC 3.4.23.36) 
5 (PROLIPOPROTEIN SIGNAL PEPTIDASE) (SIGNAL PEPTIDASE II) (SPASE II) . 

%Match =16.3 

%Identity =40.7 %Similarity =66.0 

Matches = 61 Mismatches = 50 Conservative Sub.s = 38 

10 180 210 240 270 300 330 360 390 

EWYHCYS IRRSSR* PNDLYQTKRS * FI SDGFKI CCCYGRVF*GI * FIGEVMRKI 1 1 PI ITILLIALDQLSKLWIVKHIEL 

hl = : =1 = = I 11= I 1 = 1 =1 = 1 
MKKLLSLVI IWGI IADQVFKNWWANI QL 
10 20 30 

15 

420 450 480 510 540 570 600 630 

NQIKEFIPNIVSLTYLRNYGAAFSILQNQQWLFTLITIFWGVAIIYLMKHINGSYWLLISLTLIISGGLGNFIDRLRLG 

1= l===llll==l 111=1 = 111=1 ==l 1= 11= =1 I 11= 11111=1 111== 1=1 I 
GDTKKIWPDVLSLTYIKNDGAAWSSFSGQQWFFLVLTPIVLIVALWFLWKK-MGQNWYFAGLTLIIAGALGNLLTRVRQG 
20 40 50 60 70 80 90 100 

660 690 720 750 780 810 840 870 

YVVDMvHLDFINFAIFNVADSYLTIGIICLMIALWKEESNGNHN*NSRS*AR*SFSG*F* 

=1111 =1 111=11 l==l = 1 11= == 

25 FVVDMFXNRIYDFPIFNIADILLSVGFWLFIAILTDKETK 
120 130 140 150 

There is also homology to SEQ ID 7750. 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
30 antigens for vacc 

Example 3003 

A DNA sequence <SEQ ID 905 1> was identified in S.agalactiae which encodes amino acid sequence 
<SEQ ID 9052>. Analysis of the amino acid sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
35 McG: Discrim Score: 13.24 

GvH: Signal Score (-7.5): -2.18 

Possible site: 19 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 2.01 threshold: 0.0 
40 PERIPHERAL Likelihood =2.01 21 

modified ALOM score: -0.90 

*** Reasoning Step: 3 

45 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has no homology with any sequences in the databases. 

SEQ ID 9052 (GBS138) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 19 (lane 2; MW 15kDa) 

GBS672 was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown 
in Figure 161 (lane 5-7; MW 15kDa) and in Figure 188 (lane 3; MW 15kDa). Purified protein is shown in 
55 Figure 242, lane 4. 
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Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 

Example 3004 

A DNA sequence <SEQ ID 9053> was identified in S.agalactiae which encodes amino acid sequence 
5 <SEQ ID 9054>. Analysis of the amino acid sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: 18.01 
GvH: Signal Score (-7.5): -2.35 
Possible site: 26 
10 >>> Seems to have a cleavable N-term signal seg. 

ALOM program count: 0 value: 14.80 threshold: 0.0 
PERIPHERAL Likelihood = 14.80 51 
modified ALOM score: -3.46 

15 *** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 9054 (GBS143) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 23 (lane 2; MW 33.5kDa). 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
25 antigens for vaccines and/or diagnostics. 

Example 3005 

A DNA sequence <SEQ ID 9055> was identified in S.agalactiae which encodes amino acid sequence 
<SEQ ID 9056>. Analysis of the amino acid sequence reveals the following: 

Lipop Possible site: -1 Crend: 0 
30 McG: Discrim Score: 7.43 

GvH: Signal Score (-7.5): -6.25 

Possible site: 41 
>» Seems to have an uncleavable N-term signal seg 
ALOM program count: 1 value: -10.77 threshold: 0.0 
35 INTEGRAL Likelihood =-10.77 Transmembrane 2 - 18 ( 1 - 20) 

PERIPHERAL Likelihood =5.14 29 
modified ALOM score: 2.65 

*** Reasoning Step: 3 

40 

Final Results 

bacterial membrane Certainty=0 . 5310 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=o. 0000 (Not Clear) < suco 

45 SEQ ID 9056 (GBS229) was expressed in E. coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 55 (lane 3; MW 35.9kDa). 

GBS229-GST was purified as shown in Figure 206, lane 5. 

Based on this analysis, it is predicted that this protein from S.agalactiae, and its epitopes, could be useful 
antigens for vaccines and/or diagnostics. 
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Example 3006 

A DNA sequence <SEQ ID 9183> was identified in GAS which encodes amino acid sequence <SEQ ID 
9184>. Analysis of the amino acid sequence reveals the following: 

Possible site: 29 

5 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0.3000 (Aff irmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

15 Example 3007 

A DNA sequence <SEQ ID 9185> was identified in GAS which encodes amino acid sequence <SEQ ID 
9186>. Analysis of the amino acid sequence reveals the following: 

Possible site: 28 

20 >>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Aff irmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3008 

30 A DNA sequence <SEQ ID 9187> was identified in GAS which encodes amino acid sequence <SEQ ID 
9188>. Analysis of the amino acid sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 
35 INTEGRAL Likelihood = -1.70 Transmembrane 850 - 866 ( 850 - 866) 

INTEGRAL Likelihood = -1.22 Transmembrane 15 - 31 ( 15 - 31) 

Final Results 

bacterial membrane Certainty=0 . 1680 (Aff irmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

45 Example 3009 

A DNA sequence <SEQ ID 9189> was identified in GAS which encodes amino acid sequence <SEQ ID 
9190>. Analysis of the amino acid sequence reveals the following: 

LPXTG motif: 259-263 
50 Possible site: 13 
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>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.93 Transmembrane 270 - 286 ( 268 - 288) 



15 



45 



50 



Final Results 

5 bacterial membrane Certainty=0. 2572 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
10 vaccines and/or diagnostics. 

Example 3010 

A DNA sequence <SEQ ID 9191> was identified in GAS which encodes amino acid sequence <SEQ ID 
9192>. Analysis of the amino acid sequence reveals the following: 

Possible site: 21 
>>> May be a lipoprotein 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

25 Example 3011 

A DNA sequence <SEQ ID 9193> was identified in GAS which encodes amino acid sequence <SEQ ID 
9194>. Analysis of the amino acid sequence reveals the following: 

Possible site: 29 

30 >>> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3012 

40 A DNA sequence <SEQ ID 9195> was identified in GAS which encodes amino acid sequence <SEQ ID 
9196>. Analysis of the amino acid sequence reveals the following: 

Possible site: 34 



>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 
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Example 3013 

A DNA sequence <SEQ ID 9197> was identified in GAS which encodes amino acid sequence <SEQ ID 
9198>. Analysis of the amino acid sequence reveals the following: 

Possible site: 13 
5 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.50 Transmembrane 346 - 362 ( 343 - 366) 
INTEGRAL Likelihood = -2.97 Transmembrane 177 - 193 ( 176 - 195) 

Final Results 

10 bacterial membrane Certainty=0. 24 02 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
15 vaccines and/or diagnostics. 

Example 3014 

A DNA sequence <SEQ ID 9199> was identified in GAS which encodes amino acid sequence <SEQ ID 
9200>. Analysis of the amino acid sequence reveals the following: 

Possible site: 19 
20 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.33 Transmembrane 24 - 40 ( 24 - 40) 

Final Results 

bacterial membrane Certainty=0. 1532 (Affirmative) < suco 

25 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

30 Example 3015 

A DNA sequence <SEQ ID 920 1> was identified in GAS which encodes amino acid sequence <SEQ ID 

9202>. Analysis of the amino acid sequence reveals the following: 

Possible site: 33 
>» Seems to have a cleavable N-term signal seq. 
35 INTEGRAL Likelihood = -6.00 Transmembrane 194 - 210 ( 192 - 214) 

Final Results 

bacterial membrane Certainty=0 .3399 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

LPXTG motif: 183-187 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
45 vaccines and/or diagnostics. 

Example 3016 

A DNA sequence <SEQ ID 9203> was identified in GAS which encodes amino acid sequence <SEQ ID 
9204>. Analysis of the amino acid sequence reveals the following: 

Possible site: 32 
50 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.25 Transmembrane 9 - 25 ( 4-28) 
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Final Results 

bacterial membrane Certainty=0. 5501 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3017 

10 A DNA sequence <SEQ ID 9205> was identified in GAS which encodes amino acid sequence <SEQ ID 
9206>. Analysis of the amino acid sequence reveals the following: 

Possible site: 37 
»> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -3.03 Transmembrane 462 - 478 ( 460 - 479) 
15 INTEGRAL Likelihood = -0.90 Transmembrane 18 - 34 ( 18 - 34) 

Final Results 

bacterial membrane Certainty=0. 2211 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

LPXTG motif: 450-454 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
25 vaccines and/or diagnostics. 

Example 3018 

A DNA sequence <SEQ ID 9207> was identified in GAS which encodes amino acid sequence <SEQ ID 

9208>. Analysis of the amino acid sequence reveals the following: 

Possible site: 26 
30 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.60 Transmembrane 15 - 31 ( 12 - 32) 

Final Results 

bacterial membrane Certainty=0 . 2041 (Affirmative) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

40 Example 3019 

A DNA sequence <SEQ ID 9209> was identified in GAS which encodes amino acid sequence <SEQ ID 
9210>. Analysis of the amino acid sequence reveals the following: 

Possible site: 28 
»> Seems to have no N-terminal signal sequence 
45 INTEGRAL Likelihood = -1.38 Transmembrane 16 - 32 ( 16 - 32) 

Final Results 

bacterial membrane Certainty=0 . 1553 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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Based on this analysis, it is predicted that this GAS protein , and its epitopes, could he useful antigens for 
vaccines and/or diagnostics. 

Example 3020 

A DNA sequence <SEQ ID 921 1> was identified in GAS which encodes amino acid sequence <SEQ ID 
5 9212>. Analysis of the amino acid sequence reveals the following: 

Possible cleavage site: 24 
>>> Seems to have a cleavable N-term signal seq. 

Final Results 

10 bacterial outside Certainty= 0 . 300 (Affirmative) < suco 

bacterial membrane Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
15 vaccines and/or diagnostics. 

Example 3021 

A DNA sequence <SEQ ID 9213> was identified in GAS which encodes amino acid sequence <SEQ ID 
9214>. Analysis of the amino acid sequence reveals the following: 

Possible cleavage site: 23 
20 »> May be a lipoprotein 

Final Results 

bacterial membrane Certainty= 0.000 (Not Clear) < suco 

bacterial outside — Certainty= 0.000 (Not Clear) < suco 

25 bacterial cytoplasm — Certainty= 0.000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3022 

30 A DNA sequence <SEQ ID 9215> was identified in GAS which encodes amino acid sequence <SEQ ID 
9216>. Analysis of the amino acid sequence reveals the following: 

Possible site: 19 

»> Seems to have an uncleavable N-term signal seq 
35 INTEGRAL Likelihood = -2.76 Transmembrane 3 - 19 ( 2 - 20) 

Final Results / 

bacterial membrane Certainty=0. 2105 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

RGD motif: 396-398 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
45 vaccines and/or diagnostics. 

Example 3023 

A DNA sequence <SEQ ID 9217> was identified in GAS which encodes amino acid sequence <SEQ ID 
9218>. Analysis of the amino acid sequence reveals the following: 
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Possible site: 18 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.80 Transmembrane 251 - 267 ( 251 - 267) 
5 INTEGRAL Likelihood = -0.75 Transmembrane 179 - 195 ( 179 - 195) 

Final Results 

bacterial membrane Certainty=0 . 1319 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could he useful antigens for 
vaccines and/or diagnostics. 

Example 3024 

15 A DNA sequence <SEQ ID 9219> was identified in GAS which encodes amino acid sequence <SEQ ID 
9220>. Analysis of the amino acid sequence reveals the following: 

Possible site: 39 

>>> Seems to have no N-terminal signal sequence 
20 INTEGRAL Likelihood = -1.22 Transmembrane 52 - 68 ( 51 - 68) 

Final Results 

bacterial membrane Certainty=0 . 1489 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could he useful antigens for 
vaccines and/or diagnostics. 

Example 3025 

30 A DNA sequence <SEQ ID 922 1> was identified in GAS which encodes amino acid sequence <SEQ ID 
9222>. Analysis of the amino acid sequence reveals the following: 

Possible site: 52 

»> Seems to have an uncleavable N-term signal seq 
35 INTEGRAL Likelihood =-12.58 Transmembrane 39 - 55 ( 32 - 86) 

INTEGRAL Likelihood = -9.55 Transmembrane 60- 76 ( 56- 86) 

Final Results 

bacterial membrane Certainty=0. 6031 (Affirmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

45 Example 3026 

A DNA sequence <SEQ ID 9223> was identified in GAS which encodes amino acid sequence <SEQ ID 
9224>. Analysis of the amino acid sequence reveals the following: 



50 



Possible site: 18 
>>> Seems to have an uncleavable N-term signal seq 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm Certaint;y=0. 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3027 

A DNA sequence <SEQ ID 9225> was identified in GAS which encodes amino acid sequence <SEQ ID 
9226>. Analysis of the amino acid sequence reveals the following: 

Possible site: 26 
>>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3028 

A DNA sequence <SEQ ID 9227> was identified in GAS which encodes amino acid sequence <SEQ ID 
9228>. Analysis of the amino acid sequence reveals the following: 

Possible site: 33 
>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.44 Transmembrane 18 - 34 ( 13 - 40) 
INTEGRAL Likelihood = -7.86 Transmembrane 59 - 75 ( 54 - 79) 

Final Results 

bacterial membrane Certainty=0. 4376 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3029 

A DNA sequence <SEQ ID 9229> was identified in GAS which encodes amino acid sequence <SEQ ID 
923 0>. Analysis of the amino acid sequence reveals the following: 

Possible site: 27 
>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3030 

A DNA sequence <SEQ ID 923 1> was identified in GAS which encodes amino acid sequence <SEQ ID 
9232>. Analysis of the amino acid sequence reveals the following: 



WO 02/34771 



-2922- 



PCT/GB01/04789 



Possible site: 24 
>>> Seems to have an uncleavable N-term signal seq 



Final Results 

5 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
10 vaccines and/or diagnostics. 

Example 3031 

A DNA sequence <SEQ ID 9233> was identified in GAS which encodes amino acid sequence <SEQ ID 
9234>. Analysis of the amino acid sequence reveals the following: 

Possible site: 49 
15 >» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -9.87 Transmembrane 58 - 74 ( 53 - 81) 

Final Results 

bacterial membrane Certainty=0 . 4949 (Affirmative) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

25 Example 3032 

A DNA sequence <SEQ ID 923 5> was identified in GAS which encodes amino acid sequence <SEQ ID 

9236>. Analysis of the amino acid sequence reveals the following: 

Possible site: 16 
>>> Seems to have no N- terminal signal sequence 
30 INTEGRAL Likelihood = -0.06 Transmembrane 92 - 108 ( 92 - 108) 

Final Results 

bacterial membrane Certainty=0 . 1022 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3033 

40 A DNA sequence <SEQ ID 9237> was identified in GAS which encodes amino acid sequence <SEQ ID 
923 8>. Analysis of the amino acid sequence reveals the following: 

Possible site: 40 
>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.38 Transmembrane 18 - 34 ( 18 - 34) 

45 



50 



Final Results 

bacterial membrane Certainty=0. 1553 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 
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Example 3034 

A DNA sequence <SEQ ID 9239> was identified in GAS which encodes amino acid sequence <SEQ ID 
9240>. Analysis of the amino acid sequence reveals the following: 

Possible site: 19 
5 >» Seems to have an uncleavable N-term signal seg 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could he useful antigens for 
vaccines and/or diagnostics. 

Example 3035 

15 A DNA sequence <SEQ ID 924 1> was identified in GAS which encodes amino acid sequence <SEQ ID 

9242>. Analysis of the amino acid sequence reveals the following: 

Possible site: 57 
»> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.01 Transmembrane 155 - 171 ( 154 - 171) 

20 



25 



Final Results 

bacterial membrane Certainty=0 . 1404 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3036 

A DNA sequence <SEQ ID 9243> was identified in GAS which encodes amino acid sequence <SEQ ID 
30 9244>. Analysis of the amino acid sequence reveals the following: 

Possible site: 28 
>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -4.25 Transmembrane 113 - 129 ( 111 - 131) 

35 Final Results 

bacterial membrane Certainty=0 . 2699 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3037 

A DNA sequence <SEQ ID 9245> was identified in GAS which encodes amino acid sequence <SEQ ID 
9246>. Analysis of the amino acid sequence reveals the following: 

45 Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.69 Transmembrane 110 - 126 ( 110 - 126) 

Final Results 

50 bacterial membrane Certainty=0 . 1277 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

5 Example 3038 

A DNA sequence <SEQ ID 9247> was identified in GAS which encodes amino acid sequence <SEQ ID 
9248>. Analysis of the amino acid sequence reveals the following: 

Possible site: 58 
>» Seems to have no N-terminal signal sequence 
10 INTEGRAL Likelihood = -1.28 Transmembrane 130 - 146 ( 128 - 146) 

Final Results 

bacterial membrane Certainty=0 . 1510 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3039 

20 A DNA sequence <SEQ ID 9249> was identified in GAS which encodes amino acid sequence <SEQ ID 

9250>. Analysis of the amino acid sequence reveals the following: 

Possible site: 39 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.57 Transmembrane 74 - 90 ( 72 - 92) 
25 INTEGRAL Likelihood = -3.13 Transmembrane 169 - 185 ( 166 - 185) 

INTEGRAL Likelihood = -3.13 Transmembrane 28 - 44 ( 27 - 44) 

Final Results 

bacterial membrane Certainty=0. 2826 (Affirmative) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

35 Example 3040 

A DNA sequence <SEQ ID 925 1> was identified in GAS which encodes amino acid sequence <SEQ ID 
9252>. Analysis of the amino acid sequence reveals the following: 

Possible cleavage site: 56 
>» Seems to have a cleavable N-term signal seq. 
40 INTEGRAL Likelihood =-12.21 Transmembrane 93 - 109 ( 87 - 114) 

INTEGRAL Likelihood = -8.65 Transmembrane 227 - 243 ( 226 - 243) 

Final Results 

bacterial membrane Certainty=0. 588 (Affirmative) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 
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Example 3041 

A DNA sequence <SEQ ID 9253> was identified in GAS which encodes amino acid sequence <SEQ ID 
9254>. Analysis of the amino acid sequence reveals the following: 

Possible site: 45 
»> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0. 3612 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3042 

A DNA sequence <SEQ ID 9255> was identified in GAS which encodes amino acid sequence <SEQ ID 

9256>. Analysis of the amino acid sequence reveals the following: 

Possible site: 44 
>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0 . 5670 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3043 

A DNA sequence <SEQ ID 9257> was identified in GAS which encodes amino acid sequence <SEQ ID 
9258>. Analysis of the amino acid sequence reveals the following: 

Possible site: 51 
>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -- 



Certainty=0. 5416 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 
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Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3044 

A DNA sequence <SEQ ID 9259> was identified in GAS which encodes amino acid sequence <SEQ ID 
5 9260>. Analysis of the amino acid sequence reveals the following: 

Possible site: 31 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.76 Transmembrane 137 - 153 ( 137 - 154) 

10 Final Results 

bacterial membrane Certainty=0. 2105 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3045 

A DNA sequence <SEQ ID 926 1> was identified in GAS which encodes amino acid sequence <SEQ ID 
9262>. Analysis of the amino acid sequence reveals the following: 

20 Possible site: 36 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.91 Transmembrane 238 - 254 ( 236 - 264) 

INTEGRAL Likelihood = -6.16 Transmembrane 69 - 85 ( 65 - 89) 

INTEGRAL Likelihood = -6.00 Transmembrane 136 - 152 ( 134 - 155) 

25 INTEGRAL Likelihood = -4.73 Transmembrane 29 - 45 ( 21 - 48) 

INTEGRAL Likelihood = -2.97 Transmembrane 194 - 210 ( 193 - 220) 

Final Results 

bacterial membrane Certainty=0 . 4163 (Affirmative) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

35 Example 3046 

A DNA sequence <SEQ ID 9263> was identified in GAS which encodes amino acid sequence <SEQ ID 
9264>. Analysis of the amino acid sequence reveals the following: 

Possible site: 39 
>>> Seems to have a cleavable N-term signal seq. 

40 INTEGRAL Likelihood = -9.87 Transmembrane 574 - 590 ( 568 - 601) 

INTEGRAL Likelihood = -9.18 Transmembrane 243 - 259 ( 238 - 262) 

INTEGRAL Likelihood = -7.11 Transmembrane 66 - 82 ( 65 - 87) 

INTEGRAL Likelihood = -1.28 Transmembrane 270 - 286 ( 270 - 287) 

45 Final Results 

bacterial membrane Certainty=0 .4949 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



50 Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 
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Example 3047 

A DNA sequence <SEQ ID 9265> was identified in GAS which encodes amino acid sequence <SEQ ID 
9266>. Analysis of the amino acid sequence reveals the following: 

Possible site: 33 
>>> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




-7. 


.91 


Transmembrane 


98 - 


114 


( 


92 - 


124) 


INTEGRAL 


Likelihood 




-6, 


.21 


Transmembrane 


19 - 


35 


( 


14 - 


37) 


INTEGRAL 


Likelihood 




-5. 


.36 


Transmembrane 


170 - 


186 


( 


169 - 


189) 


INTEGRAL 


Likelihood 




-5, 


.15 


Transmembrane 


147 - 


163 


( 


136 - 


167) 


INTEGRAL 


Likelihood 




-1. 


.12 


Transmembrane 


77 - 


93 


( 


77 - 


93) 



Final Results 

bacterial membrane Certainty=0. 4163 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3048 

A DNA sequence <SEQ ID 9267> was identified in GAS which encodes amino acid sequence <SEQ ID 

9268>. Analysis of the amino acid sequence reveals the following: 

Possible site: 47 
>» Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 


=-11. 


.94 


Transmembrane 


27 


- 43 


( 


19 


- 51) 


INTEGRAL 


Likelihood 


= -4. 


,83 


Transmembrane 


152 


- 168 


( 


151 


- 171) 


INTEGRAL 


Likelihood 


= -4. 


.09 


Transmembrane 


277 


- 293 


( 


276 


- 294) 


INTEGRAL 


Likelihood 


= -3. 


82 


Transmembrane 


195 


- 211 


( 


193 


- 217) 


INTEGRAL 


Likelihood 


= -2. 


.50 


Transmembrane 


120 


- 136 


( 


120 


- 137) 


INTEGRAL 


Likelihood 


= -0. 


.64 


Transmembrane 


81 


- 97 


( 


81 


- 98) 



Final Results 

bacterial membrane Certainty=0 . 5776 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3049 

A DNA sequence <SEQ ID 9269> was identified in GAS which encodes amino acid sequence <SEQ ID 
9270>. Analysis of the amino acid sequence reveals the following: 

Possible site: 36 
»> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood = 


-8. 


.49 


Transmembrane 


27 - 


43 


( 14 - 


50) 


INTEGRAL 


Likelihood = 


-8. 


.17 


Transmembrane 


58 - 


74 


( 52 - 


79) 


INTEGRAL 


Likelihood = 


-7. 


,38 


Transmembrane 


165 - 


181 


( 161 - 


193) 


INTEGRAL 


Likelihood = 


-3. 


.66 


Transmembrane 


247 - 


263 


( 246 - 


270) 


INTEGRAL 


Likelihood = 


-1. 


.54 


Transmembrane 


134 - 


150 


( 134 - 


150) 



Final Results 

bacterial membrane Certainty=0 .440 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O . 0000 (Not Clear) < suco 
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Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3050 

A DNA sequence <SEQ ID 927 1> was identified in GAS which encodes amino acid sequence <SEQ ID 
9272>. Analysis of the amino acid sequence reveals the following: 

Possible site: 55 
>» Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 




■14, 


.75 


Transmembrane 


389 - 


405 


( 


377 


- 413) 


INTEGRAL 


Likelihood 




-a. 


,44 


Transmembrane 


31 - 


47 


( 


29 


- 54) 


INTEGRAL 


Likelihood 




-7 


.17 


Transmembrane 


181 - 


197 


( 


179 


- 205) 


INTEGRAL 


Likelihood 




-7 


.01 


Transmembrane 


339 - 


355 


( 


326 


- 360) 


INTEGRAL 


Likelihood 




-6 


.58 


Transmembrane 


105 - 


121 


( 


102 


- 124) 


INTEGRAL 


Likelihood 




-5. 


.36 


Transmembrane 


225 - 


241 


( 


222 


- 244) 


INTEGRAL 


Likelihood 




-0. 


.43 


Transmembrane 


139 - 


155 


( 


139 


- 155) 


INTEGRAL 


Likelihood 




-0, 


.16 


Transmembrane 


283 - 


299 


( 


282 


- 300) 



Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Certainty=0. 6901 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3051 

A DNA sequence <SEQ ID 9273> was identified in GAS which encodes amino acid sequence <SEQ ID 
9274>. Analysis of the amino acid sequence reveals the following: 

Possible cleavage site: 25 
>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.31 Transmembrane 155 - 171 ( 154 - 174) 

INTEGRAL Likelihood = -3.50 Transmembrane 111 - 127 ( 110 - 128) 

INTEGRAL Likelihood = -2.07 Transmembrane 80 - 96 ( 78 - 96) 

INTEGRAL Likelihood = -0.90 Transmembrane 57 - 73 ( 57 - 74) 



Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -- 



- Certainty=0. 312 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 



Example 3052 

A DNA sequence <SEQ ID 9275> was identified in GAS which encodes amino acid sequence <SEQ ID 
9276>. Analysis of the amino acid sequence reveals the following: 



Possible site: 27 
>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -3.93 Transmembrane 463 



479 ( 461 - 480) 



Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -- 



- Certainty=0. 2572 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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Based on this analysis, it is predicted that this GAS protein , and its epitopes, could be useful antigens for 
vaccines and/or diagnostics. 

Example 3053 

A DNA sequence <SEQ ID 874 1> was identified in GBS which encodes amino acid sequence <SEQ ID 
5 8742>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3054 

A DNA sequence <SEQ ID 8685> was identified in GBS which encodes amino acid sequence <SEQ ID 
8686>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3055 

10 A DNA sequence <SEQ ID 10303> was identified in GBS which encodes amino acid sequence <SEQ ID 
10304>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3056 

A DNA sequence <SEQ ID 10305> was identified in GBS which encodes amino acid sequence <SEQ ID 
10306>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

15 Example 3057 

A DNA sequence <SEQ ID 10307> was identified in GBS which encodes amino acid sequence <SEQ ID 
10308>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3058 

A DNA sequence <SEQ ID 10309> was identified in GBS which encodes amino acid sequence <SEQ ID 
20 10310>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3059 

A DNA sequence <SEQ ID 1031 1> was identified in GBS which encodes amino acid sequence <SEQ ID 
10312>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3060 

25 A DNA sequence <SEQ ID 10313> was identified in GBS which encodes amino acid sequence <SEQ ID 
10314>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3061 

A DNA sequence <SEQ ID 10315> was identified in GBS which encodes amino acid sequence <SEQ ID 
10316>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

30 Example 3062 

A DNA sequence <SEQ ID 10317> was identified in GBS which encodes amino acid sequence <SEQ ID 
10318>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3063 

A repeated DNA sequence <SEQ ID 10319> was identified in GBS which encodes amino acid sequence 
35 <SEQ ID 10320>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3064 

A DNA sequence <SEQ ID 10321> was identified in GBS which encodes amino acid sequence <SEQ ID 
10322>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3065 

5 A DNA sequence <SEQ ID 10323> was identified in GBS which encodes amino acid sequence <SEQ ID 
10324>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3066 

A DNA sequence <SEQ ID 10325> was identified in GBS which encodes amino acid sequence <SEQ ID 
10326>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3067 

A DNA sequence <SEQ ID 10327> was identified in GBS which encodes amino acid sequence <SEQ ID 
10328>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3068 

A DNA sequence <SEQ ID 10329> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10330>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3069 

A DNA sequence <SEQ ID 10331> was identified in GBS which encodes amino acid sequence <SEQ ID 
10332>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3070 

20 A DNA sequence <SEQ ID 10333> was identified in GBS which encodes amino acid sequence <SEQ ID 
10334>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3071 

A DNA sequence <SEQ ID 10335> was identified in GBS which encodes amino acid sequence <SEQ ID 
10336>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3072 

A DNA sequence <SEQ ID 10339> was identified in GBS which encodes amino acid sequence <SEQ ID 
10340>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3073 

A DNA sequence <SEQ ID 10341> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10342>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3074 

A DNA sequence <SEQ ID 10343> was identified in GBS which encodes amino acid sequence <SEQ ID 
10344>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3075 

35 A DNA sequence <SEQ ID 10345> was identified in GBS which encodes amino acid sequence <SEQ ID 
10346>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3076 

A DNA sequence <SEQ ID 10347> was identified in GBS which encodes amino acid sequence <SEQ ID 
10348>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3077 

5 A DNA sequence <SEQ ID 10349> was identified in GBS which encodes amino acid sequence <SEQ ID 
10350>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3078 

A DNA sequence <SEQ ID 10351> was identified in GBS which encodes amino acid sequence <SEQ ID 
10352>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3079 

A DNA sequence <SEQ ID 10353> was identified in GBS which encodes amino acid sequence <SEQ ID 
10354>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3080 

A DNA sequence <SEQ ID 10355> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10356>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3081 

A DNA sequence <SEQ ID 10357> was identified in GBS which encodes amino acid sequence <SEQ ID 
10358>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3082 

20 A DNA sequence <SEQ ID 10359> was identified in GBS which encodes amino acid sequence <SEQ ID 
10360>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3083 

A DNA sequence <SEQ ID 10361> was identified in GBS which encodes amino acid sequence <SEQ ID 
10362>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3084 

A DNA sequence <SEQ ID 10363> was identified in GBS which encodes amino acid sequence <SEQ ID 
10364>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3085 

A DNA sequence <SEQ ID 10365> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10366>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3086 

A DNA sequence <SEQ ID 10367> was identified in GBS which encodes amino acid sequence <SEQ ID 
10368>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3087 

35 A DNA sequence <SEQ ID 10369> was identified in GBS which encodes amino acid sequence <SEQ ID 
10370>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3088 

A DNA sequence <SEQ ID 10371> was identified in GBS which encodes amino acid sequence <SEQ ID 
10372>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3089 

5 A DNA sequence <SEQ ID 10373> was identified in GBS which encodes amino acid sequence <SEQ ID 
10374>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3090 

A DNA sequence <SEQ ID 10375> was identified in GBS which encodes amino acid sequence <SEQ ID 
10376>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3091 

A DNA sequence <SEQ ID 10377> was identified in GBS which encodes amino acid sequence <SEQ ID 
10378>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3092 

A DNA sequence <SEQ ID 10379> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10380>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3093 

A DNA sequence <SEQ ID 10381> was identified in GBS which encodes amino acid sequence <SEQ ID 
10382>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3094 

20 A DNA sequence <SEQ ID 10383> was identified in GBS which encodes amino acid sequence <SEQ ID 
10384>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3095 

A DNA sequence <SEQ ID 10385> was identified in GBS which encodes amino acid sequence <SEQ ID 
10386>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3096 

A DNA sequence <SEQ ID 10387> was identified in GBS which encodes amino acid sequence <SEQ ID 
10388>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3097 

A DNA sequence <SEQ ID 10389> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10390>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3098 

A DNA sequence <SEQ ID 10391> was identified in GBS which encodes amino acid sequence <SEQ ID 
10392>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3099 

35 A DNA sequence <SEQ ID 10393> was identified in GBS which encodes amino acid sequence <SEQ ID 
10394>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3100 

A DNA sequence <SEQ ID 10395> was identified in GBS which encodes amino acid sequence <SEQ ID 
10396>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3101 

5 A DNA sequence <SEQ ID 10397> was identified in GBS which encodes amino acid sequence <SEQ ID 
10398>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3102 

A DNA sequence <SEQ ID 10399> was identified in GBS which encodes amino acid sequence <SEQ ID 
1040O. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3103 

A DNA sequence <SEQ ID 10401> was identified in GBS which encodes amino acid sequence <SEQ ID 
10402>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3104 

A DNA sequence <SEQ ID 10403> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10404>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3105 

A DNA sequence <SEQ ID 10405> was identified in GBS which encodes amino acid sequence <SEQ ID 
10406>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3106 

20 A DNA sequence <SEQ ID 10407> was identified in GBS which encodes amino acid sequence <SEQ ID 
10408>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3107 

A DNA sequence <SEQ ID 10409> was identified in GBS which encodes amino acid sequence <SEQ ID 
10410>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3108 

A DNA sequence <SEQ ID 1041 1> was identified in GBS which encodes amino acid sequence <SEQ ID 
10412>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3109 

A DNA sequence <SEQ ID 10413> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10414>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3110 

A DNA sequence <SEQ ID 10415> was identified in GBS which encodes amino acid sequence <SEQ ID 
10416>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3111 

35 A DNA sequence <SEQ ID 10417> was identified in GBS which encodes amino acid sequence <SEQ ID 
10418>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3112 

A DNA sequence <SEQ ID 10419> was identified in GBS which encodes amino acid sequence <SEQ ID 
1042O. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3113 

5 A DNA sequence <SEQ ID 10421> was identified in GBS which encodes amino acid sequence <SEQ ID 
10422>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3114 

A DNA sequence <SEQ ID 10423> was identified in GBS which encodes amino acid sequence <SEQ ID 
10424>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3115 

A DNA sequence <SEQ ID 10425> was identified in GBS which encodes amino acid sequence <SEQ ID 
10426>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3116 

A DNA sequence <SEQ ID 10427> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10428>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3117 

A DNA sequence <SEQ ID 10429> was identified in GBS which encodes amino acid sequence <SEQ ID 
1043O. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3118 

20 A DNA sequence <SEQ ID 10431> was identified in GBS which encodes amino acid sequence <SEQ ID 
10432>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3119 

A DNA sequence <SEQ ID 10433> was identified in GBS which encodes amino acid sequence <SEQ ID 
10434>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3120 

A DNA sequence <SEQ ID 10435> was identified in GBS which encodes amino acid sequence <SEQ ID 
10436>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3121 

A DNA sequence <SEQ ID 10437> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10438>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3122 

A DNA sequence <SEQ ID 10441> was identified in GBS which encodes amino acid sequence <SEQ ID 
10442>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3123 

35 A DNA sequence <SEQ ID 10443> was identified in GBS which encodes amino acid sequence <SEQ ID 
10444>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3124 

A DNA sequence <SEQ ID 10445> was identified in GBS which encodes amino acid sequence <SEQ ID 
10446>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3125 

5 A DNA sequence <SEQ ID 10447> was identified in GBS which encodes amino acid sequence <SEQ ID 
10448>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3126 

A DNA sequence <SEQ ID 10449> was identified in GBS which encodes amino acid sequence <SEQ ID 
1045O. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3127 

A DNA sequence <SEQ ID 1 045 1 > was identified in GBS which encodes amino acid sequence <SEQ ID 
10452>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3128 

A DNA sequence <SEQ ID 10453> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10454>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3129 

A DNA sequence <SEQ ID 10455> was identified in GBS which encodes amino acid sequence <SEQ ID 
10456>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3130 

20 A DNA sequence <SEQ ID 10457> was identified in GBS which encodes amino acid sequence <SEQ ID 
10458>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. A related 
GBS nucleic acid sequence <SEQ ID 10907> which encodes amino acid sequence <SEQ ID 10908> was 
also identified. 

Example 3131 

25 A DNA sequence <SEQ ID 10459> was identified in GBS which encodes amino acid sequence <SEQ ID 
10460>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3132 

A DNA sequence <SEQ ID 10461> was identified in GBS which encodes amino acid sequence <SEQ ID 
10462>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

30 Example 3133 

A DNA sequence <SEQ ID 10463> was identified in GBS which encodes amino acid sequence <SEQ ID 
10464>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3134 

A DNA sequence <SEQ ID 10465> was identified in GBS which encodes amino acid sequence <SEQ ID 
35 10466>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3135 

A DNA sequence <SEQ ID 10467> was identified in GBS which encodes amino acid sequence <SEQ ID 
10468>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3136 

5 A DNA sequence <SEQ ID 10469> was identified in GBS which encodes amino acid sequence <SEQ ID 
1047O. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3137 

A DNA sequence <SEQ ID 10471> was identified in GBS which encodes amino acid sequence <SEQ ID 
10472>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3138 

A DNA sequence <SEQ ID 10473> was identified in GBS which encodes amino acid sequence <SEQ ID 
10474>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3139 

A DNA sequence <SEQ ID 10475> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10476>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3140 

A DNA sequence <SEQ ID 10477> was identified in GBS which encodes amino acid sequence <SEQ ID 
10478>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3141 

20 A DNA sequence <SEQ ID 10479> was identified in GBS which encodes amino acid sequence <SEQ ID 
10480>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3142 

A DNA sequence <SEQ ID 10481> was identified in GBS which encodes amino acid sequence <SEQ ID 
10482>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3143 

A DNA sequence <SEQ ID 10483> was identified in GBS which encodes amino acid sequence <SEQ ID 
10484>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3144 

A DNA sequence <SEQ ID 10485> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10486>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3145 

A DNA sequence <SEQ ID 10487> was identified in GBS which encodes amino acid sequence <SEQ ID 
10488>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3146 

35 A DNA sequence <SEQ ID 10489> was identified in GBS which encodes amino acid sequence <SEQ ID 
10490>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3147 

A DNA sequence <SEQ ID 10491> was identified in GBS which encodes amino acid sequence <SEQ ID 
10492>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3148 

5 A DNA sequence <SEQ ID 10493> was identified in GBS which encodes amino acid sequence <SEQ ID 
10494>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3149 

A DNA sequence <SEQ ID 10495> was identified in GBS which encodes amino acid sequence <SEQ ID 
10496>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3150 

A DNA sequence <SEQ ID 10497> was identified in GBS which encodes amino acid sequence <SEQ ID 
10498>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3151 

A DNA sequence <SEQ ID 10499> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 1050O. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3152 

A DNA sequence <SEQ ID 10501> was identified in GBS which encodes amino acid sequence <SEQ ID 
1 0502>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3153 

20 A DNA sequence <SEQ ID 10503> was identified in GBS which encodes amino acid sequence <SEQ ID 
10504>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3154 

A DNA sequence <SEQ ID 10505> was identified in GBS which encodes amino acid sequence <SEQ ID 
10506>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3155 

A DNA sequence <SEQ ID 10509> was identified in GBS which encodes amino acid sequence <SEQ ID 
10510>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3156 

A DNA sequence <SEQ ID 1051 1> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10512>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3157 

A DNA sequence <SEQ ID 10513> was identified in GBS which encodes amino acid sequence <SEQ ID 
10514>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3158 

35 A DNA sequence <SEQ ID 10515> was identified in GBS which encodes amino acid sequence <SEQ ID 
10516>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3159 

A DNA sequence <SEQ ID 10517> was identified in GBS which encodes amino acid sequence <SEQ ID 
10518>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3160 

5 A DNA sequence <SEQ ID 10519> was identified in GBS which encodes amino acid sequence <SEQ ID 
10520>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3161 

A DNA sequence <SEQ ID 10521> was identified in GBS which encodes amino acid sequence <SEQ ID 
10522>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3162 

A DNA sequence <SEQ ID 10523> was identified in GBS which encodes amino acid sequence <SEQ ID 
10524>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3163 

A DNA sequence <SEQ ID 10525> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10526>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3164 

A DNA sequence <SEQ ID 10527> was identified in GBS which encodes amino acid sequence <SEQ ID 
10528>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3165 

20 A DNA sequence <SEQ ID 10529> was identified in GBS which encodes amino acid sequence <SEQ ID 
10530>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3166 

A DNA sequence <SEQ ID 10531> was identified in GBS which encodes amino acid sequence <SEQ ID 
10532>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3167 

A DNA sequence <SEQ ID 10533> was identified in GBS which encodes amino acid sequence <SEQ ID 
10534>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3168 

A DNA sequence <SEQ ID 10535> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10536>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3169 

A DNA sequence <SEQ ID 10537> was identified in GBS which encodes amino acid sequence <SEQ ID 
10538>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3170 

35 A DNA sequence <SEQ ID 10539> was identified in GBS which encodes amino acid sequence <SEQ ID 
10540>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3171 

A DNA sequence <SEQ ID 10541> was identified in GBS which encodes amino acid sequence <SEQ ID 
10542>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3172 

5 A DNA sequence <SEQ ID 10543> was identified in GBS which encodes amino acid sequence <SEQ ID 
10544>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3173 

A DNA sequence <SEQ ID 10545> was identified in GBS which encodes amino acid sequence <SEQ ID 
10546>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 SEQ ID 10546 (GBS665) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 137 (lane 8-10; MW 41kDa) and in Figure 187 (lane 5; MW 41kDa). It was 
also was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in 
Figure 137 (lane 11 & 12; MW 16.1kDa), in Figure 141 (lane 4; MW 16kDa) and in Figure 179 (lane 6; 
MW 16kDa). Purified GBS665-GST is shown in Figure 243, lane 4. 

15 GBS665-His was purified as shown in Figure 230, lane 7-8. 

Example 3174 

A DNA sequence <SEQ ID 10547> was identified in GBS which encodes amino acid sequence <SEQ ID 
10548>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. A related 
GBS nucleic acid sequence <SEQ ID 10909> which encodes amino acid sequence <SEQ ID 1091 0> was 
20 also identified. 

Example 3175 

A DNA sequence <SEQ ID 10549> was identified in GBS which encodes amino acid sequence <SEQ ID 
10550>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3176 

25 A DNA sequence <SEQ ID 10551> was identified in GBS which encodes amino acid sequence <SEQ ID 
10552>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3177 

A DNA sequence <SEQ ID 10553> was identified in GBS which encodes amino acid sequence <SEQ ID 
10554>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

30 Example 3178 

A DNA sequence <SEQ ID 10555> was identified in GBS which encodes amino acid sequence <SEQ ID 
10556>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3179 

A DNA sequence <SEQ ID 10557> was identified in GBS which encodes amino acid sequence <SEQ ID 
35 10558>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3180 

A DNA sequence <SEQ ID 10559> was identified in GBS which encodes amino acid sequence <SEQ ID 
10560>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3181 

5 A DNA sequence <SEQ ID 10561> was identified in GBS which encodes amino acid sequence <SEQ ID 
10562>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3182 

A DNA sequence <SEQ ID 10563> was identified in GBS which encodes amino acid sequence <SEQ ID 
10564>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3183 

A DNA sequence <SEQ ID 10565> was identified in GBS which encodes amino acid sequence <SEQ ID 
10566>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3184 

A DNA sequence <SEQ ID 10567> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10568>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3185 

A DNA sequence <SEQ ID 10569> was identified in GBS which encodes amino acid sequence <SEQ ID 
10570>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3186 

20 A DNA sequence <SEQ ID 10571> was identified in GBS which encodes amino acid sequence <SEQ ID 
10572>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3187 

A DNA sequence <SEQ ID 10573> was identified in GBS which encodes amino acid sequence <SEQ ID 
10574>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3188 

A DNA sequence <SEQ ID 10575> was identified in GBS which encodes amino acid sequence <SEQ ID 
10576>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3189 

A DNA sequence <SEQ ID 10577> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10578>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3190 

A DNA sequence <SEQ ID 10579> was identified in GBS which encodes amino acid sequence <SEQ ID 
10580>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3191 

35 A DNA sequence <SEQ ID 10581> was identified in GBS which encodes amino acid sequence <SEQ ID 
10582>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3192 

A DNA sequence <SEQ ID 10583> was identified in GBS which encodes amino acid sequence <SEQ ID 
10584>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3193 

5 A DNA sequence <SEQ ID 10585> was identified in GBS which encodes amino acid sequence <SEQ ID 
10586>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3194 

A DNA sequence <SEQ ID 10587> was identified in GBS which encodes amino acid sequence <SEQ ID 
10588>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3195 

A DNA sequence <SEQ ID 10591> was identified in GBS which encodes amino acid sequence <SEQ ID 
10592>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3196 

A DNA sequence <SEQ ID 10593> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10594>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3197 

A DNA sequence <SEQ ID 10595> was identified in GBS which encodes amino acid sequence <SEQ ID 
10596>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3198 

20 A DNA sequence <SEQ ID 10597> was identified in GBS which encodes amino acid sequence <SEQ ID 
10598>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. A related 
GBS nucleic acid sequence <SEQ ID 10797> which encodes amino acid sequence <SEQ ID 10798> was 
also identified. 

Example 3199 

25 A DNA sequence <SEQ ID 10599> was identified in GBS which encodes amino acid sequence <SEQ ID 
10600>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3200 

A DNA sequence <SEQ ID 10601> was identified in GBS which encodes amino acid sequence <SEQ ID 
10602>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

30 Example 3201 

A DNA sequence <SEQ ID 10603> was identified in GBS which encodes amino acid sequence <SEQ ID 
10604>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3202 

A DNA sequence <SEQ ID 10605> was identified in GBS which encodes amino acid sequence <SEQ ID 
35 10606>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3203 

A DNA sequence <SEQ ID 10607> was identified in GBS which encodes amino acid sequence <SEQ ID 
10608>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3204 

5 A DNA sequence <SEQ ID 10609> was identified in GBS which encodes amino acid sequence <SEQ ID 
1061 0>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3205 

A DNA sequence <SEQ ID 1061 1> was identified in GBS which encodes amino acid sequence <SEQ ID 
10612>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3206 

A DNA sequence <SEQ ID 10613> was identified in GBS which encodes amino acid sequence <SEQ ID 
10614>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3207 

A DNA sequence <SEQ ID 10615> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10616>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3208 

A DNA sequence <SEQ ID 1061 7> was identified in GBS which encodes amino acid sequence <SEQ ID 
10618>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3209 

20 A DNA sequence <SEQ ID 10619> was identified in GBS which encodes amino acid sequence <SEQ ID 
10620>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3210 

A DNA sequence <SEQ ID 10621> was identified in GBS which encodes amino acid sequence <SEQ ID 
10622>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3211 

A DNA sequence <SEQ ID 10623> was identified in GBS which encodes amino acid sequence <SEQ ID 
10624>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3212 

A DNA sequence <SEQ ID 10625> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10626>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3213 

A DNA sequence <SEQ ID 10627> was identified in GBS which encodes amino acid sequence <SEQ ID 
10628>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3214 

35 A DNA sequence <SEQ ID 10629> was identified in GBS which encodes amino acid sequence <SEQ ID 
10630>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 



WO 02/34771 



-2943- 



PCT/GB01/04789 



Example 3215 

A DNA sequence <SEQ ID 10631> was identified in GBS which encodes amino acid sequence <SEQ ID 
10632>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3216 

5 A DNA sequence <SEQ ID 10633> was identified in GBS which encodes amino acid sequence <SEQ ID 
10634>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. A related 
GBS nucleic acid sequence <SEQ ID 10939> which encodes amino acid sequence <SEQ ID 1094O was 
also identified. 

SEQ ID 10634 (GBS675) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
10 cell extract is shown in Figure 162 (lane 14 & 15; MW 56kDa). It was also expressed in E.coli as a His- 
fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 163 (lane 2; MW 31kDa) and in 
Figure 188 (lane 5; MW 31kDa). 

Purified GBS675-His is shown in Figure 240, lane 7-8. 

Example 3217 

15 A DNA sequence <SEQ ID 10635> was identified in GBS which encodes amino acid sequence <SEQ ID 
10636>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3218 

A DNA sequence <SEQ ID 10637> was identified in GBS which encodes amino acid sequence <SEQ ID 
10638>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

20 Example 3219 

A DNA sequence <SEQ ID 10639> was identified in GBS which encodes amino acid sequence <SEQ ID 
10640>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3220 

A DNA sequence <SEQ ID 10641> was identified in GBS which encodes amino acid sequence <SEQ ID 
25 10642>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3221 

A DNA sequence <SEQ ID 10643> was identified in GBS which encodes amino acid sequence <SEQ ID 
10644>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3222 

30 A DNA sequence <SEQ ID 10645> was identified in GBS which encodes amino acid sequence <SEQ ID 
10646>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3223 

A DNA sequence <SEQ ID 10647> was identified in GBS which encodes amino acid sequence <SEQ ID 
10648>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

35 Example 3224 

A DNA sequence <SEQ ID 10649> was identified in GBS which encodes amino acid sequence <SEQ ID 
10650>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 



WO 02/34771 



-2944- 



PCT/GB01/04789 



Example 3225 

A DNA sequence <SEQ ID 10651> was identified in GBS which encodes amino acid sequence <SEQ ID 
10652>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3226 

5 A DNA sequence <SEQ ID 10653> was identified in GBS which encodes amino acid sequence <SEQ ID 
10654>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3227 

A DNA sequence <SEQ ID 10655> was identified in GBS which encodes amino acid sequence <SEQ ID 
10656>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3228 

A DNA sequence <SEQ ID 10657> was identified in GBS which encodes amino acid sequence <SEQ ID 
10658>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3229 

A DNA sequence <SEQ ID 10659> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10660>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3230 

A DNA sequence <SEQ ID 10661> was identified in GBS which encodes amino acid sequence <SEQ ID 
10662>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3231 

20 A DNA sequence <SEQ ID 10663> was identified in GBS which encodes amino acid sequence <SEQ ID 
10664>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3232 

A DNA sequence <SEQ ID 10665> was identified in GBS which encodes amino acid sequence <SEQ ID 
10666>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. A related 
25 GBS nucleic acid sequence <SEQ ID 10917> which encodes amino acid sequence <SEQ ID 1 09 1 8> was 
also identified. 

A DNA sequence <SEQ ID 10667> was identified in GBS which encodes amino acid sequence <SEQ ID 
10668>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3233 

30 A DNA sequence <SEQ ID 10669> was identified in GBS which encodes amino acid sequence <SEQ ID 
10670>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3234 

A DNA sequence <SEQ ID 10671> was identified in GBS which encodes amino acid sequence <SEQ ID 
10672>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

35 Example 3235 

A DNA sequence <SEQ ID 10673> was identified in GBS which encodes amino acid sequence <SEQ ID 
10674>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3236 

A DNA sequence <SEQ ID 10675> was identified in GBS which encodes amino acid sequence <SEQ ID 
10676>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3237 

5 A DNA sequence <SEQ ID 10677> was identified in GBS which encodes amino acid sequence <SEQ ID 
10678>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3238 

A DNA sequence <SEQ ID 10679> was identified in GBS which encodes amino acid sequence <SEQ ID 
10680>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3239 

A DNA sequence <SEQ ID 10681> was identified in GBS which encodes amino acid sequence <SEQ ID 
10682>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3240 

A DNA sequence <SEQ ID 10683> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10684>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3241 

A DNA sequence <SEQ ID 10685> was identified in GBS which encodes amino acid sequence <SEQ ID 
10686>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3242 

20 A DNA sequence <SEQ ID 10687> was identified in GBS which encodes amino acid sequence <SEQ ID 
10688>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3243 

A DNA sequence <SEQ ID 10689> was identified in GBS which encodes amino acid sequence <SEQ ID 
10690>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3244 

A DNA sequence <SEQ ID 10691> was identified in GBS which encodes amino acid sequence <SEQ ID 
10692>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

SEQ ID 10692 (GBS676) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 163 (lane 3-5; MW 66kDa) and in Figure 239 (lane 8; MW 66kDa). It was 
30 also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
163 (lane 7 & 8; MW 41kDa) and in Figure 188 (lane 6; MW 41kDa). Purified GBS676-His is shown in 
Figure 240, lane 4-5. Purified GBS676-GST is shown in Figure 246, lanes 10 & 11. 

Example 3245 

A DNA sequence <SEQ ID 10693> was identified in GBS which encodes amino acid sequence <SEQ ID 
35 10694>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3246 

A DNA sequence <SEQ ID 10695> was identified in GBS which encodes amino acid sequence <SEQ ID 
10696>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3247 

5 A DNA sequence <SEQ ID 10697> was identified in GBS which encodes amino acid sequence <SEQ ID 
10698>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3248 

A DNA sequence <SEQ ID 10699> was identified in GBS which encodes amino acid sequence <SEQ ID 
1070O. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3249 

A DNA sequence <SEQ ID 10703> was identified in GBS which encodes amino acid sequence <SEQ ID 
10704>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3250 

A DNA sequence <SEQ ID 10705> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10706>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3251 

A DNA sequence <SEQ ID 10707> was identified in GBS which encodes amino acid sequence <SEQ ID 
10708>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3252 

20 A DNA sequence <SEQ ID 10709> was identified in GBS which encodes amino acid sequence <SEQ ID 
10710>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. A related 
GBS nucleic acid sequence <SEQ ID 10803> which encodes amino acid sequence <SEQ ID 10804> was 
also identified. 

Example 3253 

25 A DNA sequence <SEQ ID 1071 1> was identified in GBS which encodes amino acid sequence <SEQ ID 
10712>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. A related 
GBS nucleic acid sequence <SEQ ID 10913> which encodes amino acid sequence <SEQ ID 10914> was 
also identified. 

Example 3254 

30 A DNA sequence <SEQ ID 10713> was identified in GBS which encodes amino acid sequence <SEQ ID 
10714>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3255 

A DNA sequence <SEQ ID 10715> was identified in GBS which encodes amino acid sequence <SEQ ID 
10716>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

35 Example 3256 

A DNA sequence <SEQ ID 10717> was identified in GBS which encodes amino acid sequence <SEQ ID 
10718>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3257 

A DNA sequence <SEQ ID 10719> was identified in GBS which encodes amino acid sequence <SEQ ID 
1072O. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3258 

5 A DNA sequence <SEQ ID 10721> was identified in GBS which encodes amino acid sequence <SEQ ID 
10722>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3259 

A DNA sequence <SEQ ID 10723> was identified in GBS which encodes amino acid sequence <SEQ ID 
10724>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3260 

A DNA sequence <SEQ ID 10725> was identified in GBS which encodes amino acid sequence <SEQ ID 
10726>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3261 

A DNA sequence <SEQ ID 10727> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10728>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3262 

A DNA sequence <SEQ ID 10729> was identified in GBS which encodes amino acid sequence <SEQ ID 
10730>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

SEQ ID 10730 (GBS670) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
20 cell extract is shown in Figure 140 (lane 2-4; MW 45.3kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 140 (lane 5-7; MW 20.4kDa) and in 
Figure 179 (lane 10; MW 20kDa). 

GBS670-His was purified as shown in Figure 230, lane 9-10. 
Example 3263 

25 A DNA sequence <SEQ ID 10731> was identified in GBS which encodes amino acid sequence <SEQ ID 
10732>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3264 

A DNA sequence <SEQ ID 10733> was identified in GBS which encodes amino acid sequence <SEQ ID 
10734>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

30 Example 3265 

A DNA sequence <SEQ ID 10735> was identified in GBS which encodes amino acid sequence <SEQ ID 
10736>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3266 

A DNA sequence <SEQ ID 10737> was identified in GBS which encodes amino acid sequence <SEQ ID 
35 10738>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3267 

A DNA sequence <SEQ ID 10739> was identified in GBS which encodes amino acid sequence <SEQ ID 
1074O. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3268 

5 A DNA sequence <SEQ ID 10741> was identified in GBS which encodes amino acid sequence <SEQ ID 
10742>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3269 

A DNA sequence <SEQ ID 10743> was identified in GBS which encodes amino acid sequence <SEQ ID 
10744>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3270 

A DNA sequence <SEQ ID 10745> was identified in GBS which encodes amino acid sequence <SEQ ID 
10746>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3271 

A DNA sequence <SEQ ID 10747> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10748>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3272 

A DNA sequence <SEQ ID 10749> was identified in GBS which encodes amino acid sequence <SEQ ID 
1075O. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3273 

20 A DNA sequence <SEQ ID 10751> was identified in GBS which encodes amino acid sequence <SEQ ID 
10752>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3274 

A DNA sequence <SEQ ID 10753> was identified in GBS which encodes amino acid sequence <SEQ ID 
10754>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3275 

A DNA sequence <SEQ ID 10755> was identified in GBS which encodes amino acid sequence <SEQ ID 
10756>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3276 

A DNA sequence <SEQ ID 10757> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10758>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3277 

A DNA sequence <SEQ ID 10759> was identified in GBS which encodes amino acid sequence <SEQ ID 
10760>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3278 

35 A DNA sequence <SEQ ID 10761> was identified in GBS which encodes amino acid sequence <SEQ ID 
10762>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3279 

A DNA sequence <SEQ ID 10763> was identified in GBS which encodes amino acid sequence <SEQ ID 
10764>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3280 

5 A DNA sequence <SEQ ID 10765> was identified in GBS which encodes amino acid sequence <SEQ ID 
10766>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3281 

A DNA sequence <SEQ ID 10767> was identified in GBS which encodes amino acid sequence <SEQ ID 
10768>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3282 

A DNA sequence <SEQ ID 10769> was identified in GBS which encodes amino acid sequence <SEQ ID 
10770>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3283 

A DNA sequence <SEQ ID 10771> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10772>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3284 

A repeated DNA sequence <SEQ ID 10791> was identified in GBS which encodes amino acid sequence 
<SEQ ID 10792>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3285 

20 A DNA sequence <SEQ ID 10805> was identified in GBS which encodes amino acid sequence <SEQ ID 
10806>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3286 

A DNA sequence <SEQ ID 10807> was identified in GBS which encodes amino acid sequence <SEQ ID 
10808>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3287 

A DNA sequence <SEQ ID 10809> was identified in GBS which encodes amino acid sequence <SEQ ID 
10810>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3288 

A DNA sequence <SEQ ID 1081 1> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10812>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3289 

A DNA sequence <SEQ ID 10813> was identified in GBS which encodes amino acid sequence <SEQ ID 
10814>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3290 

35 A DNA sequence <SEQ ID 10815> was identified in GBS which encodes amino acid sequence <SEQ ID 
10816>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3291 

A DNA sequence <SEQ ID 10817> was identified in GBS which encodes amino acid sequence <SEQ ID 
10818>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3292 

5 A DNA sequence <SEQ ID 10819> was identified in GBS which encodes amino acid sequence <SEQ ID 
1082O. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3293 

A DNA sequence <SEQ ID 10821> was identified in GBS which encodes amino acid sequence <SEQ ID 
10822>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3294 

A DNA sequence <SEQ ID 10823> was identified in GBS which encodes amino acid sequence <SEQ ID 
10824>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3295 

A DNA sequence <SEQ ID 10825> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10826>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3296 

A DNA sequence <SEQ ID 10827> was identified in GBS which encodes amino acid sequence <SEQ ID 
10828>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3297 

20 A DNA sequence <SEQ ID 10829> was identified in GBS which encodes amino acid sequence <SEQ ID 
10830>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3298 

A DNA sequence <SEQ ID 1083 1> was identified in GBS which encodes amino acid sequence <SEQ ID 
10832>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3299 

A DNA sequence <SEQ ID 10833> was identified in GBS which encodes amino acid sequence <SEQ ID 
10834>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3300 

A DNA sequence <SEQ ID 10835> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10836>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3301 

A DNA sequence <SEQ ID 10837> was identified in GBS which encodes amino acid sequence <SEQ ID 
10838>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3302 

35 A DNA sequence <SEQ ID 10839> was identified in GBS which encodes amino acid sequence <SEQ ID 
10840>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3303 

A DNA sequence <SEQ ID 10841> was identified in GBS which encodes amino acid sequence <SEQ ID 
10842>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3304 

5 A DNA sequence <SEQ ID 10843> was identified in GBS which encodes amino acid sequence <SEQ ID 
10844>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3305 

A DNA sequence <SEQ ID 10845> was identified in GBS which encodes amino acid sequence <SEQ ID 
10846>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3306 

A DNA sequence <SEQ ID 10847> was identified in GBS which encodes amino acid sequence <SEQ ID 
10848>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3307 

A DNA sequence <SEQ ID 10849> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 1085O. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3308 

A DNA sequence <SEQ ID 10851> was identified in GBS which encodes amino acid sequence <SEQ ID 
10852>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3309 

20 A DNA sequence <SEQ ID 10853> was identified in GBS which encodes amino acid sequence <SEQ ID 
10854>. Related sequences are <SEQ ID 10855>, <SEQ ID 10856>, <SEQ ID 10857>, <SEQ ID 10858>, 
<SEQ ID 10859>, <SEQ ID 10860>, <SEQ ID 10861>, <SEQ ID 10862>, <SEQ ID 10863>, <SEQ ID 
10864>, <SEQ ID 10865> and <SEQ ID 10866>. These proteins and their epitopes could be useful 
antigens for vaccines and/or diagnostics. 

25 Example 3310 

A DNA sequence <SEQ ID 10867> was identified in GBS which encodes amino acid sequence <SEQ ID 
10868>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3311 

A DNA sequence <SEQ ID 10869> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10870>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3312 

A DNA sequence <SEQ ID 10871> was identified in GBS which encodes amino acid sequence <SEQ ID 
10872>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3313 

35 A DNA sequence <SEQ ID 10873> was identified in GBS which encodes amino acid sequence <SEQ ID 
10874>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3314 

A DNA sequence <SEQ ID 10875> was identified in GBS which encodes amino acid sequence <SEQ ID 
10876>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3315 

5 A DNA sequence <SEQ ID 10877> was identified in GBS which encodes amino acid sequence <SEQ ID 
10878>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3316 

A DNA sequence <SEQ ID 10879> was identified in GBS which encodes amino acid sequence <SEQ ID 
10880>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

10 Example 3317 

A DNA sequence <SEQ ID 10881> was identified in GBS which encodes amino acid sequence <SEQ ID 
10882>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3318 

A DNA sequence <SEQ ID 10883> was identified in GBS which encodes amino acid sequence <SEQ ID 
15 10884>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3319 

A DNA sequence <SEQ ID 10885> was identified in GBS which encodes amino acid sequence <SEQ ID 
1 0886>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3320 

20 A DNA sequence <SEQ ID 10887> was identified in GBS which encodes amino acid sequence <SEQ ID 
10888>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3321 

A DNA sequence <SEQ ID 10889> was identified in GBS which encodes amino acid sequence <SEQ ID 
10890>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

25 Example 3322 

A DNA sequence <SEQ ID 10891> was identified in GBS which encodes amino acid sequence <SEQ ID 
10892>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3323 

A DNA sequence <SEQ ID 10893> was identified in GBS which encodes amino acid sequence <SEQ ID 
30 10894>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3324 

A DNA sequence <SEQ ID 10895> was identified in GBS which encodes amino acid sequence <SEQ ID 
10896>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3325 

35 A DNA sequence <SEQ ID 10897> was identified in GBS which encodes amino acid sequence <SEQ ID 
10898>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 
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Example 3326 

A DNA sequence <SEQ ID 10899> was identified in GBS which encodes amino acid sequence <SEQ ID 
10900>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3327 

A DNA sequence <SEQ ID 10901> was identified in GBS which encodes amino acid sequence <SEQ ID 
10902>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3328 

A DNA sequence <SEQ ID 10903> was identified in GBS which encodes amino acid sequence <SEQ ID 
10904>. This protein and its epitopes could be useful antigens for vaccines and/or diagnostics. 

Example 3329 

Seven rRNA genes were identified in S.agalactiae. These are SEQ IDs 12018 to 12024. These rRNA genes 
are particularly useful for diagnostic purposes and for phlyogenetic studies. An alignment of the rRNA 
sequences is shown below: 



12023 TTTCGAGTCAAAGTCATCAGCGTT 

12024 

12019 TCCAaTCATACTTAATTTCACTAATATCTGGATTTTGACATATTCAGTTAATTCT 

12 02 1 ... ATCGAATTGAACGGACTCAATTO3GTTGTTATGTAATTTT- -ACATAATCTATGATTTCT 

12020 

12018 

12022 CTTCTTTGTTTTCTTTAGAGATATTAACTGTA 

12023 TACTGTTACGGCAGCAGTTCCaAGAGTTACTCCAOTCIACAAGGACTGCTGATAATATTCT 

12024 --- - --- 

12019 TTTTCATGCTTTTTGAGATAAGCTACTTGTTCTTTTTTTATTACTTTTTTACCTTTCTTT 

12021 TGCTCATGCTCTTTGAGATAGGCTAATTGTTCTTTTTTTGTCATTTTTTTATCTTTCTTC 

12020 

12018 

12022 CCCACTTTGGGCGTTAAAATACCTAAAGTAGCCTTTATTAAAGTTGATTTAGCAGCCCCA 

12 023 TTTTTTCATTTTTATTAAACTACTCCTTTAC- -GATAAGACATTAAATATTTTACCAAAA 

12024 - 

12019 ACTGCTGACTGTTTGCTATTTTTTACTTCGTTTGACTGACTTTTAGATTCACTATTCATT 

12021 ACTTCTGATTGCTTGCTATTTTTTACTTCGTTTGACTGAATTTTATGTTCACTATTCATT 

12020 

12018 CTTT - GATACAATATTATCAAAATTATATTAA 

12022 TTTTCACCTGTTAAGGTAACARACTCCCCACT-GTCTAAATGGTAATTAACCCCTTCCAG 

12023 AATTCACGAAATTATATTACGTCATTGTTACATTTATATTTGAAATCAACTATTTCTAAa. 

12024 

12 0 19 TGACAGCCTGCTAGTAACATCCCAA.TAATAGATATGGGAATTAACCATTTTACATATTTT 

12021 TGACAGCCTCCAAGTATCATCCCAARAATTGATATGGGAATTAaCCATTTTATATATTTT 

12020 

12018 CGGTAAAGATATTGTTAAAGACCAAACTTGGATTATCAATCGT TATCAAGAAATTA 

12022 CA- CAGGATCGCTATCGTACTGAAAAGTAAGACCACTAACTGTAATATATCGCATGATTA 

12023 TGARCCATAATCAAATCTAGAAAA.CGATAACCTTCTTCTATTCACTCT ATCAATATA 

12024 

12019 TTCAACATGCTCTCTTTTCTTAGAAAATAAACTTCCCATGTCAAGTATCTAATAAAAATA 

12021 CTCATCATGTTCTCTTTTCTTAGAATATAAATTTTATATATCAAGTATATAATGARATTA 

12020 

12 018 TTAGTG- - -ATTTGTCTTTAGGAAGCACTA TTGCAGAAGA- - -AATTACTCG 

12022 CCCTTCT- -AATTCTCTAGAGAAAAGATCSiAGAAAACGTTCTAAAACG ACCTTTTCG 

12023 ATTACTCCATAGTGAAACTAAAAGAGAAATAAAAAAAGAGTATAATTACTCTTAAAATTA 

12024 

12019 ATTATTATTTACCAGTATGTTAAAaCTAATATTAGTATAACAAa-TTTTCACGAGTTTAA 
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12021 
12020 
12018 
12022 



ACTATTATTCACC^CATTATAAAATTAATTTTAGTATAACAAAATTTTCACGTATTTTT 

ATCAAAAAAACATGACCAGTATGAATTAAAGCAACGTATAATCAATGCCT 

CTCTAT- -AGAGCAGCTAGCTTCACTTCCCATAGAAAATAATCAGTTTTTAT-ATGAT- - 
TCCTTTGAAAAATGATTTACTAATCmca^AAACCCCTAACGTATTGTCATGATGATGT 



10 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



TAATATTTACGGAGAATAAGGGATTCGAACCCTTGCGCCAGTTACCCGACCTAACGATTT 

TT--TTTTAGTCGTAACATATACACTGAAAAATCTTATTATTTTATACTACCTATCTATC 

ATAGTTTTAGTCTTAACATGTAAACAGAAA A TC 

TAATGCGTAAAGGATACCAGTACGAAGATA TC 

TGTTTTTTAGCAGCCGGTGAAGATA ACAACGCAAAGTT 

GTGTGTTCATCTGCAATGGGTTTAGCAAGT TCA GATAACTCAAAATA 



15 



20 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



AGCAAACCGTCCTCTTCAGCCTCTTGAG--TAATTCTCCAAATTAATATTAATGGGCACG 

ATTCACAAACACTTTTATTACTTCAGAACCTATGACATTTAGGAGTCCTCTTTGAATTTC 

ATTTGTATA T TTTAAATGCCCTAATTAAATT - - 

AAAAGTGC T TTAAGAGAATATTTATAAGAT- - 

AGTTGCA-ACGTTTTTTAATCAAAATGA- - CATTCCTGCAAGATATGTTCATCCAAACGA 
AGTAATACGAGCATCTTTAGAATCTTTA--TTCGCTTTCAACATATCCTGAGA-AATTAA 



25 



30 



35 



12023 
12024 
12019 
12021 
12020 
12018 
12022 

12023 
12024 
12019 
12021 
12020 
12018 
12022 



AGTGGACTCGAACCACCGACCTCACGCTTATCAGGCGTGCGCTCTAACCACCTGAGCTAC 

ATTTAAATGTTGAGTCTCCACTAACTCTTGAAAAATTTCCTTATTATTTCTGCTTGTTTT 

AATAATT AATATTTATTATTATATA 

AATAACTCTCAGACGATGTATT - TTACAGA 

AGCAGGAATTATTGTAACTAAAGAACCATG- -TAATGCACGAATTATT- -CCAG GA 

ACTTTTTACTGCTTTAGTTACAGCTGCCTGACTAATATTTAACTTCTTAGCTAAATCAGA 

GCGCCCAAGCAAATGCTTGGTTTTACTTTTATGTAAAGTAAGCGGGTGACGAGAATCGA- 

AAACCTTCTATAAQ^TTGCAATAATGAAAAACAAATATAAGTAATTTTCAGTAACTTTT 

AATTCTTCTACAATGA AARAAATAAATATAT--A-TTACAAGTAACATT- 

AAAT TATGATAA A CTATAACAGACGTAT - -AAATTGTAGAAAGTTG- 

AGTTATGATAAGATTGA GAACTTATGTCTATACAATGAGGTTCTTGTTATCCCT 

ATTTGTCAACTGCTCTT GTGATAAAAGCATCAGAATGTGTTCTTGCGTATTAGT 



40 



45 



50 



12023 
12024 
12019 
12021 
12020 
12018 
12022 

12023 
12024 
12019 
12021 
12020 
12018 
12022 



-ACTCGCGACAACAGCTTGGAAGGCTGTAGTTTTACCACTAAACTACACCCGCTAAABAC 

TCTCAAAATTACCAGCACAATACAAAAAAGACAAGGCTTCTAAACCTTGTCTTTATAAAT 
- - TCACAATAAATTATCTAGTAGAAAAAAGACAAGGTTTAGAAACCTTGTCTTTATAAGT 

GTAGGCTATGAGATTACCTAAAGAAGGCGACTTTATTACAATTCAAAGTTACAARC 

GGATTT TTTGG- -AGTCACAGAAGATAAC- CAAATTTGTACCTTTTCAAGA 

CAATTTAA - CATCACTTTGACAAGTACCAAACAATAATTCATGTTGATTTTCTGCTTTAA 

TTATATAATAAATGGCGCGAGACGGAATCGAACCGCCGACACATGGAGCTTCAATCCATT 

ATACCGGCGGCCGGGGTCGAACCGGCACGTCCGTGAGGACACTGGATTTTGAGTCCAGCG 
ATACCGGCGGCCGGGGTCGAACCGGCACGTCCGTGAGGACACTGGATTTTGAGTCCAGCG 

ATGATGGTAGTTTACACCGAACTTG GCGTGACACCA- TGGTATTAAAAACAACCG 

GGGGGATCTGACATTACTGGATC CCTAATTGC AGCAGGCATAAA 

GCAAGATTTGAC - TCACTAAATGG TCTAATTTTTGTTCTAAAACTGTCATATA 



55 



60 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



GCTCTACCAACTGAGCTACCGAGCCTATTGCGGGAGCAGGATTTGAACCTACGACCTTCG 

CGTCTGCCAATTCCGCCACGCCGGCTATCTTAAAACTGGGGTAGCTGGATTCGA--ACCA 
CGTCTGCCAATTCCGCCACGCCGGCTATCTTAAAACTGGGGTAGCTGGATTCGA--ACCA 

AAAATGCC- -CTCATTGGTGTTAATGATCAT ACTTTAGTAACAGAAAATGATGGTCG 

AGCAGACCT - TTATGAGAACTTCACAGATGT TGATGGTATATTTGCAGCACATCCA 

TACCT - CTT - TTTTGTTAACCAGTAAATTATATCACGAAGATATAGAAGAATCAATCATA 



65 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



GGTTA- TGAGCCCGACGAGCTACCTAGCTGCTCCA TCCCGCGATATCTTTAAA 

ACGCA-TGAGGGAGTCAAAGTCCCTTGCCTTACCG CTTGGCTATACCCCATGA 

ACGCA- TGAGGGAGTCAAAGTCCCTTGCCTTACCG CTTGGCTATACCCCATGA 

ACGC- -TGGGTGACACGAGAGCC- -TGCAATA GTATACTTTCATA 

GGT GTAGTTAAGAACCCTCACGCTA TCCCTGAGCTTACTTATA 

GATAGGTGAAGAAGATAAAACCTTTTATCTCAACAACCTAACTTTATAAACTTCTTTGCA 
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10 



15 



12023 
12024 
12019 
12021 
12020 
12018 
12022 

12023 
12024 
12019 
12021 
12020 
12018 
12022 



GGA GGATGTGGGATT03AACCCaCGCACGCTTTTACAC--GCCTGACGGTT 

AAAGGCG AGTGATGGGAATCGAACCCACGAATGTCAGAGCCACAATCTGATGTGT 

AAAGGCG AGTGATGGGAATCGAACCCACGAATGTCAGAGCCACAATCTGATGTGT 

AAA AATACTGG- - -T TT- -AACATTATCGCTA TGATACGT 

AAGA AATGCGTGAATTAGCCTATGCGGGTTTTTCGGTTT-TACATGATGAA- 

AAAACCTTTCATACTATTAAAAACACGATCAGCTTTTTTCTCTGTAG-AACACATTGAAR 

TTCAAGACCGTTCCCTTCAGCCGGACTTGGGTAATCCTCCATATAACAAAAAATATGGAC 

TAACCACTTCACCACACCCGCCATATTAGAAAAAACACGGGCAGTAGGAATCGAA.CCCAC 
TAACCACTTCACCACACCCGCCATATTAGAAAAAACACGGGCAGTAGGAATCGAACCCAC 

GAAACTGGTGTCTCCTACTATTGTAATCTAGCAAGT CCGTATATCTTGGACCC-- 

GCTTTACTTCCTGCCTATCGTGGCAGAATCCCTCTTGTTATTAAAAATAC-- 

AAACAGTTGGTCCACTTCCTGTC-ATTAATGCAA.CATCGGCTCCAGAATTTAACATAC-- 



20 



25 



30 



12023 
12024 
12019 
12021 
12020 
12018 
12022 

12023 
12024 
12019 
12021 
12020 
12018 
12022 



CTTGTAGGACTCGAACCTACGACCGCTCGGTTATGAGCCGAGTGCTCTAACCAGTTGAGC 

ACTGAAGGTTTTGGAGACCTTAGTTCTACCTTTAAACTATGCCCGTTTACTATGGAGAGA 
ACTGAAGGTTTTGGAGACCTTAGTTCTACCTTTAAACTATGCCCGTTTACTATGGAGAGA 
- - TGAAGCACTCAAGTATATTGACTATGACCTTGATGTCARAGTATTTGCAGATGGTGAA 

AAA TAATCCCCAACAGCCTGGTACAAAAATAGTTTTAAAGCATACTCGTAG- 

GTTCTTTTATTGTACTTATAACTGGATTTTTAGTAATTGTAATATCCTCGAGTGAA 

TAAAGGTCCAAAGTCTCAATAAAATAAATAGCGGCGGAGGGGATCGAACCCCCGACCTCC 

GAGGGATTCGAACCCCCGAACCCGAAGGAGCGGATTTACAGTCCGCCGCGTTTAGCCTCT 
GAGGGATTCGAACCCCCGAACCCGAAGGAGCGGATTTACAGTCCGCCGCGTTTAGCCTCT 
AAAAGACTACTAGATGTGGACGAATATGAACAGCATAAAGYTCAGATGAACT- -ATCCTA 
- -TAACATAGCAGTAACTGG-GATCGCT- -TCTGATAGCCGTTTTGCTAGCATAAACGTA 
TTTCCCATAGATTTGACCATTAACTGATAATCTGATGACIAAAATAGCAGACTTTAATAAA 



35 



40 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



CGGGTATG-AACCGGACGCTCTAGCCAGCT--GAGCTACACCGCCATAAAAATATATCCA 

TCGCTATC-TCTCCTAAGGTATAAATGGCGCGAGACGGAATCGAACCGCCGACACATGGA 
TCGCTATC-TCTCCTAAGGTATAAATGGCGCGAGACGGAATCGAACCGCCGACACATGGA 
CCGATATT - GATTATATATTAAAGGAAAATGTAAAAATATTGGTAGAATGGATAAATGAG 

TCTAAAT- -ACTTAATGAATAGA GAAGTAGGTTTCGGCCGAAAAG TACTACAA 

TCAATATCAACTCTACTTATAGACTTACAATCAATATCTCTAAAAATGGATTTAGTTGAA 



45 



50 



55 



12023 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



TCGGGAAGACAGGATTCGAACCTGCGACACCTTGGTCCCAAACCAAGTACTCTACCAAGC 

GCTTCAATCCATTGCTCTACCAACTGAGCTACCGAGCCTATTGCGGGAGCAGGATTTGAA 
GCTTCAATCCATTGCTCTACCAACTGAGCTACCGAGCCTATTGCGGGAGCAGGATTTGAA 

AATAAAGGCCCCTTTTC-ATCATC--ATATATCAA-TATCTGGTATAAACGGTA- 

ATTTTAGAG GATTTAAATATT AGTTTTGAACATATGCCAACTGGCATAGATGAT 

ATACCAAAATCCGGCTTAACCAGA- - -ACTATCCAACATGGTCTCAATGTCGGTAAGGGT 

TGAGCTACTTCCCGAAAAATATGCAC--CCTAGAGGAGTCGAACCTCTAACCGCCTGATT 

CCTACGACCTTCGGGTTATGAGCCCG--ACGAGCTACCTAGCTGCTCCATCCCGCGATAT 
CCTACGACCTTCGGGTTATGAGCCCG--ACGAGCTACCTAGCTGCTCCATCCCGCGATAT 

CCTTGAATTGAAA AAGCGCTAACTAAC - ACACTAAATAGTG - TGT 

CTATCCATTGT CTTACGTGAAA AAGAATTGACACCAATCAAAGAACAAGAAATC 

TTAACAATTTCACCTTTACCTAATACTAACGAACATCCCCCACCAAGACAATAAGGAACA 



60 



65 



12023 
12024 
12019 
12021 
12020 
12018 
12022 

12023 
12024 
12019 



CGTAGTCAG GTACTCTATCCAGTTGAGCTAAGGGTGCTAAATATTATA- 



-TGCC 



CTTTAAAGGAGGATGTGGGATTCGAACCCACGCACGCTTTTACACGCCTGACG--GTTTT 
CTTTAAAGGAGGATGTGGGATTCGAACCCACGCACGCTTTTACACGCCTGACG--GTTTT 

TTTTATTA ATATCAAATTTAATTACA ATACTATTGCAAAAATAT ATACT 

TTAAATTACCTAACTCGTAAACTAGAAGTAG--ATTACGTTGACATCCAA 

TC - -ACTACC -AATTTTAAAACCAATAGCAACCATTTCGTCATAGTCCATTTGAAGATTC 

GAGGACCGGAATC GAACCGGTACGATGTTTACCATCGCAGGATTTTAAGTCCTGTG 

CAAGACCGTTCCCTTCAGCCGGACTTGC3GTAATCCTCCATATAACAAAAAATAGTCCGTA 
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12021 
12020 
12018 
12022 



ClAAGRCCGTTCCCTTCAGCCGGACrTGGGTAATCCTCCATATAACAAAAAATAGTCCGTA 

TAAAATAAA AAAAGTAGAAAGATCACTTTCTACTTTTTTAAGAATAGTCCGTA 

CACAATCTATC TACaATCGTAATTGTAGGTGAAA-ATATGAAftAGTCAGATTG 

CATAATCGATT AAGAGCTCTTATTGTAGCAGCAGCATCAGTAGAACCACCCCC 



10 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



CGTCTGCCAGTTCCGCCACCCCGGCCTCTAACAAGCGAACGACGGGGTTCGAACCCGCGA 

CGGGATTCGAACCCGTGTTACCGCCGTGAAAAGGCGGTGTCTTAACCCCTTGACCAACGG 
CGGGATTCGAACCCGTGTTACCGCCGTGAAAAGGCGGTGTCTTAACCCCTTGACCAACGG 
CGGGATTCGAACCCGTGTTACCGCCGTGAAAAGGCGGTGTCTTAACCCCTTGACCAACGG 

GAGTCACTGCAACAGCGACACAAGCCTTATC AAGAGAAAAA ATCAATAT 

CAGTC - CTGCACAGACAGGAATGGATTTTTCTAATCTAATATGAACACCTTTATTAATAC 



15 



20 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



-CCCTCACCTT- 



-GGCaAGGTGATGTTCTACCACTGAACTACGTTCGCACTAAAGAC 



-ACCATATTCTTGATGGGCACGAGTGGACTCGAACCACCGACCTCACGCTTATCAGGCGT 
-ACCATATTCTTGATGGGCACGAGTGGACTCGAACCACCGACCTCACGCTTATCAGGCGT 
-ACCATATTCTTGATGGGCACGAGTGGACTCGAACCACCGACCTCACGCTTATCAGGCGT 

CACCATGAT ATCACA AGGTTCAAGCGAA- -GTCTCCATTATGT 

CATATTGATTTTTGATTATATCTGCAGCTTTAAACACATCATTATCATTATTTAAAGGCA 



25 



30 



35 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



ACTATTTATCCTATAAAATTGTAATGCCGGC 

TATCCTATAAAATTGTAATGCCGGC 

GCGCTCTAACCACCTGAGCTACGCGCCCAAAATAACTTCTAAAATTATAAAGTTAATGCC 

GCGCTCTAACCACCTGAGCTACGCGCCCAAG CTA 

GCGCTCTAACCACCTGAGCTACGCGCCCAAG CTA 

- - TCGTTATAAACAGTAAGGATGAAAAAAGAG CTAT 

TTTTGCTACTATCAGAATCGATAACAATACAAT CTT CCTT 

** * 

- - -TACATGACTTGAACACGCGACCCTCTGATTACARATCAGATGCTCTACCAACTGAGC 

TACATGACTTGAACIACX3CGACCCTCTGATTACftAATC2AGATGCTCTACCAACTGAGC 

GGCTACATGACTTGAAC!ACX3CGACCCTCTGATTACAi\ATCAGATGCTCTACCAACTGAGC 

- - TTGCTTGGTTT T- -TACTTTCTTATA A 

- -TTGCTTGGTTT T- -TACTTTCTTATA A 

TAAAGCACTATATGAA-ACAT- -TCTTCCAAA- -AATAGTACCTATTACACTACTTACAC 
TAGCTCAGAAATGGTA-ACGTAGTCATTAAGATCAATACTAACCATAATCATAGCTAATT 



40 



45 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



TAAGCCGGCAATCTACTAATGCGGGTGAAGGGACTTGAACCCCCACGCCGTTAAGCGCCA 
TAAGCCGGCAATCTACTAATGCGGGTGAAGGGACTTGAACCCCCACGCCGTTAAGCGCCA 
TAAGCCGGCAATCTACTAATGCGGGTGAAGGGACTTGAACCCCCACGCCGTTAAGCGCCA 

A G TAAAGCGGGTGACGAGAATCGAACTC 

AG TAAAGCGGGTGACGAGAATCGAACTC 

TATTAGATAGATAA- - CAAATCGTCCT AAGTAAGCTTA CTTAGGACGA 

CATGATAACCATCGT - CACATCGTCCTTTAATATCTAATCCTAAATTAAGTTTGGCAGGA 
* ** * * * 



50 



55 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



GATCCTAAATCTGGTGCGTCTGCCAATTCCGCCACACCCGCATTTCTAAATGACCCGTAC 
GATCCTAAATCTGGTGCGTCTGCCAATTCCGCCACACCCGCATTTCTAAATGACCCGTAC 
GATCCTAAATCTGGTGCGTCTGCCAATTCCGCCACACCCGCATTTCTAAATGACCCGTAC 

GCGACAACAGC 

GCGACAACAGC 

TTTT ATTTAGAACATAGGATAGTTTTTCCACTTTTAATCGTAA CCACTT 

GCTT TCTCAAAAATTTTCATAAAACCTCCCTAATAAAATATAGAA-T-ATCCATAT 



60 



65 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



TGGGCTCGAACCAGTGACCCATTGATTAAAAGTCAATTGCTCTACCAACTGAGCTAACGA 
TGGGCTCGAACCAGTGACCCATTGATTAAAAGTCAATTGCTCTACCAACTGAGCTAACGA 
TGGGCTCGAACCAGTGACCCATTGATTAAAAGTCAATTGCTCTACCAACTGAGCTAACGA 

T TGGAAGGCTGTAGTTTTACCA- CTAAACTA 

TTGGAAGGCTGTAGTTTTACCA- CTAAACTA 

GGTATCA GTGACA AATTCGGA- -CAATTAAGATGTTAGCCAATCTTAAGG 

TATAACATAACAAATGACA AATTCGGA- -CAATTAAGATGCTAGCCAATCTTAAGG 

* *★* * ***** 



12023 
12024 



GTCTACGGTCCCGACGGGAATCGAACCCGCGATCTTCGCCGTGACAGGGCGACGTGATAA 
GTCTACGGTCCCGACGGGAATCGAACCCGCGATCTTCGCCGTGACAGGGCGACGTGATAA 
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PCT/GB01/04789 



-2957- 

12 0 19 GTCTACGGTCCCGACGGGAATCGAACCCGCGATCTTCGCCGTGACAGGGCGA.CGTGATAA 

12021 CACC 

12020 CACC 

12018 ATA- ATAATTCCAATAAAAA AAGGCTAACCAAAGTTAGTC 

5 12022 ATA-ATAATTCCAATAAAAA AAGGCTAACCAAAGTTAGTC 

* 

12 023 CCGCTACACTACGGGACCTATGGGAGTTAACGGGATCGAACCGCTGACCCTCTGCTTGTA 

12024 CCGCTACACTACGGGACCTATGGGAGTTAACGGGATCGAACCGCTGACCCTCTGCTTGTA 

10 12019 CCGCTACACTACGGGACCTATGGGAGTTAACGGGATCGAACCGCTGACCCTCTGCTTGTA 

12021 -CGCT TCTATGGGAGTTAACGGGATCGAACCGCTGACCCTCTGCTTGTA 

12020 -CGCT TCTATGGGAGTTAACGGGATCGAACCGCTGACCCTCTGCTTGTA 

12 018 TCCCTTTA TCTACTCCGCCAGTAGGACTCGAACCTACGACATCATGATTAAC 

12 022 TCCCTTTA TCTACTCCGCCAGTAGGACTCGAACCTACGACATCATGATTAAC 

15 * ** *** ** ******* *** ** ** 

12023 AGGCAGATGCT-CTCCCAGCTGAGCTAAACTCCCTTT- -GCTAAGCGACTACCTTATCTC 

12024 AGGCAGATGCT- CTCCCAGCTGAGCTAAACTCCCTTT- -GCTAAGCGACTACCTTATCTC 

12019 AGGCAGATGCT -CTCCCAGCTGAGCTAAACTCCCTTT- -GCTAAGCGACTACCTTATCTC 
20 12021 AGGCAGATGCT- CTCCCAGCTGAGCTAAACTCCCTTT- -GCTAAGCGACTACCTTATCTC 

12020 AGGCAGATGCT - CTCCCAGCTGAGCTAAACTCCCTTT - -GCTAAGCGACTACCTTATCTC 

12018 AGTCATGCGCTACTACCAACTGAGCTATGGCGGATTATAGCTAAGCGACTACCTTATCTC 
12 022 AGTCATGCGCTACTACCAACTGAGCTATGGCGGATTATAGCTAAGCGACTACCTTATCTC 

** ** *** ** *** ******** ** ********************* 

25 

12 023 ACAGGGGGCAACCCCCAACTACTTCCGGCGTTCTAGGGCTTAACTTCTGTGTTCGGCATG 

12 024 ACAGGGGGCAACCCCCAACTACTTCCGGCGTTCTAGGGCTTAACTTCTGTGTTCGGCATG 

12019 ACAGGGGGCAACCCCCAACTACTTCCGGCGTTCTAGGGCTTAACTTCTGTGTTCGGCATG 
12 02 1 ACAGGGGGCAACCCCCAACTACTTCCGGCGTTCTAGGGCTTAACTTCTGTGTTCGGCATG 

30 12020 ACAGGGGGCAACCCCCAACTACTTCCGGCGTTCTAGGGCTTAACTTCTGTGTTCGGCATG 

12018 ACAGGGGGCAACCCCCAACTACTTCCGGCSTTCTAGGGCTTAACTTCTGTGTTCGGC^ 
12 022 ACAGGGGGCAACCCCCAACTACTTCCGGCGTTCTAGGGCTTAACTTCTGTGTTCGGCATG 

************************************************************ 

35 12023 AGAAC!AGGTGTATC!TCCTAGGC!AATTAT(^CTT^ 

12 024 AGAACAGGTGTATCTCCTAGGCAATTATCACTTAACTATTGAGCCTTATTCACTCAAAAT 

12019 AGAACAGGTGTATCTCCTAGGCAATTATCACTTAACTATTGAGCCTTATTCACTCAAAAT 

12021 AGAACAGGTGTATCTCCTAGGCAATTATCACTTAACTATTGAGCCTTATTCACTCAAAAT 
12 02 0 AGAACAGGTGTATCTCCTAGGCAATTATCACTTAACTATTGAGCCTTATTCACTCAAAAT 

40 12018 AGAACAGGTGTATCTCCTAGGCAATTATCACTTAACTATTGAGCCTTATTCACTCAAAAT 

12022 AGAACAGGTGTATCTCCTAGGCAATTATCACTTAACTATTGAGCCTTATTCACTCAAAAT 
************************************************************ 

12 023 TGAATATCTATAGTCTAACAAGAAACCGTAACGTTGTCAATATCTCTTTTTGGATAAGTC 

45 12024 TGAATATCTATAGTCTAACAAGAAACCGTAACGTTGTCAATATCTCTTTTTGGATAAGTC 

12019 TGAATATCTATAGTCTAACAAGAAACCGTAACGTTGTCAATATCTCTTTTTGGATAAGTC 

12021 TGAATATCTATAGTCTAACAAGAAACCGTAACGTTGTCAATATCTCTTTTTGGATAAGTC 
12 02 0 TGAATATCTATAGTCTAACAAGAAACCGTAACGTTGTCAATATCTCTTTTTGGATAAGTC 
12018 TGAATATCTATAGTCTAACAAGAAACCGTAACGTTGTCAATATCTCTTTTTGGATAAGTC 

50 12022 TGAATATCTATAGTCTAACAAGAAACCGTAACGTTGTCAATATCTCTTTTTGGATAAGTC 

************************************************************ 

12 023 CTCGAGCTATTAGTATTAGTCCGCTAAATGTGTCACCACAATTACACTCCTAACCTATCT 

12024 CTCGAGCTATTAGTATTAGTCCGCTAAATGTGTCACCACAATTACACTCCTAACCTATCT 

55 12019 CTCGAGCTATTAGTATTAGTCCGCTAAATGTGTCACCACAATTACACTCCTAACCTATCT 

12021 CTCGAGCTATTAGTATTAGTCCGCTAAATGTGTCACCACAATTACACTCCTAACCTATCT 
12 02 0 CTCGAGCTATTAGTATTAGTCCGCTAAATGTGTCACCACAATTACACTCCTAACCTATCT 
12018 CTCGAGCTATTAGTATTAGTCCGCTAAATGTGTCACCACAATTACACTCCTAACCTATCT 
12 022 CTCGAGCTATTAGTATTAGTCCGCTAAATGTGTCACCACAATTACACTCCTAACCTATCT 

************************************************************ 

12 023 ACCTGATCATCTCTCAGGGCTCTTA(ITGATATAAAATCATGGGAAATCTCATCTTGAGGT 

12 024 ACCTGATCATCTCTCAGGGCTCTTACTGATATAAAATCATGGGAAATCTCATCTTGAGGT 

12 0 19 ACCTGATCATCTCT(aGGGCTCTTACTGATATAAAATCATGGGAAATCTCATCTTGAGGT 

65 12021 ACCTGATCATCTCTCAGGGCTCTTACTGATATAAAATCATGGGAAATCTCATCTTGAGGT 

12020 ACCTGATCATCTCTCAGGGCTCTTACTGATATAAAATCATGGGAAATCTCATCTTGAGGT 
12018 ACCTGATCATCTCTCAGGGCTCTTACTGATATAAAATCATGGGAAATCTCATCTTGAGGT 

12022 ACCTGATCATCTCTCAGGGCTCTTACTGATATAAAATCATGGGAAATCTCATCTTGAGGT 
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-2958- 



********************************^ 
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12024 
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12021 
12020 
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12022 



GGGCTTCGCACTTAGATGCTTTCAGCX3CTTATCCCTTCCCTACATAGCTACCCAGCGATG 
GGGCTTCGCACTTAGATGCTTTCAGCGCTTATCCCTTCCCTACATAGCTACCCAGCGATG 
GGGCTTCGCACTTAGATGCTTTCAGCGCTTATCCCTTCCCTACATAGCTACCCAGCGATG 
GGGCTTCGCACTTAGATGCTTTCAGCGCTTATCCCTTCCCTACATAGCTACCCAGCGATG 
GGGCTTCGCACTTAGATGCTTTCAGCGCTTATCCCTTCCCTACATAGCTACCCAGCGATG 
GGGCTTCGCACTTAGATGCTTTCAGGGCTTATCCCTTCCCTACATAGCTACCCAGCGATG 
GGGCTTCGCACTTAGATGCTTTCAGCGCTTATCCCTTCCCTACATAGCTACCCAGCGATG 
************************************************************ 
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12024 
12019 
12021 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



CCTTTGGCAAGACAACTGGTACACCAGCGGTAAGTCCACTCTGGTCCTCTCGTACTAGGA 
CCTTTGGCAAGACAACTGGTACACCAGCGGTAAGTCCACTCTGGTCCTCTCGTACTAGGA 
CCTTTGGCAAGACAACTGGTACACCAGCGGTAAGTCCACTCTGGTCCTCTCGTACTAGGA 
CCTTTGGCAAGACAACTGGTACACCAGCGGTAAGTCCACTCTGGTCCTCTCGTACTAGGA 
CCTTTGGCAAGACAACTGGTACACCAGCGGTAAGTCCACTCTGGTCCTCTCGTACTAGGA 
CCTTTGGCAAGACAACTGGTACACCAGCGGTAAGTCCACTCTGGTCCTCTCGTACTAGGA 
CCTTTGGCAAGACARCTGGTACACCAGCGGTAAGTCCACTCTGGTCCTCTCGTACTAGGA 
************************************************************ 

GCAGATCCTCTCAAATTTCCTACGCCCGGGACGGATAGGGACCGAACTGTCTCa.CGACGT 
GCAGATCCTCTCAAATTTCCTACGCCCGCGACGGATAGGGACCGAACTGTCTCACGACGT 
GCAGATCCTCTCAAATTTCCTACGCCCGCGACGGATAGGGACCGAACTGTCTCACGACGT 
GCAGATCCTCTCARATTTCCTACGCCCGCGACGGATAGGGACCGAACTGTCTCACGACGT 
GCAGATCCTCTCAAATTTCCTACGCCCGCGACGGATAGGGACCGAACTGTCTCACGACGT 
GCAGATCCTCTCAAATTTCCTACGCCCGCGACGGATAGGGACCGAACTGTCTCACGACGT 
GCAGATCCTCTCAAATTTCCTACGCCCGCGACGGATAGGGACCGAACTGTCTCACGACGT 
************************************************************ 
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35 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



TCTGAACCCAGCTCGCGTGCCGCTTTAATGGGCGAACAGCCCAACCCTTGGGACCGACTA 
TCTGaACCCAGCTCGCGTGCCGCTTTAATGGGCGAaCAGCCCAACCCTTGGGACCGACTA 
TCTGAA.CCCAGCTCGCGTGCCGCTTTAATGGGCGAAC2VGCCCAACCCTTGGGACCGACTA 
TCTGAACCCAGCTCGCGTGCCGCTTTAATGGGCGAACAGCCCAACCCTTGGGACCGACTA 
TCTGAACCCaGCTCGCGTGCCGCTTTAA.TGGGCGAAC3M3CCCAACCCTTGGGACCGACTA 
TCTGAACCCAGCTCGCGTGCCGCTTTAATGGGCGftACAGCCCAACCCTTGGGACCGACTA 
TCTGAACCCAGCTCGCGTGCCGCTTTAATGGGCGAACAGCCCAACCCTTGGGACCGACTA 
************************************************************ 



40 



45 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



CAGCCCCAGGATGCGACGAGCCGACATCGAGGTGCCAAACCTCCCCGTCGATGTGAACTC 
CAGCCCCAGGATGCGACGAGCCGACATCGAGGTGCCAAACCTCCCCGTCGATGTGAACTC 
CAGCCCCAGGATGCGACGAGCCGACATCGAGGTGCCAAACCTCCCCGTCGATGTGAACTC 
CAGCCCCAGGATGCGACGAGCCGACATCGAGGTGCCAAACCTCCCCGTCGATGTGAACTC 
CAGCCCCAGGATGCGACGAGCCGACATCGAGGTGCCAAACCTCCCCGTCGATGTGAR.CTC 
CAGCCCCAGGATGCGACGAGCCGACATCGAGGTGCCAAACCTCCCCGTCGATGTGAACTC 
CAGCCCCAGGATGCGACGAGCCGACATCGAGGTGCCAAACCTCCCCGTCGATGTGAACTC 
************************************************************ 
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12024 
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12021 
12020 
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TTGGGGGAGATAAGCCTGTTATCCCCAGGGTAGCTTTTATCCGTTGAGCGATGGCCCTTC 
TTGGGGGAGATAAGCCTGTTATCCCCAGGGTAGCTTTTATCCGTTGAGCGATGGCCCTTC 
TTGGGGGAGATAAGCCTGTTATCCCCAGGGTAGCTTTTATCCGTTGAGCGATGGCCCTTC 
TTGGGGGAGATAAGCCTGTTATCCCCAGGGTAGCTTTTATCCGTTGAGCGATGGCCCTTC 
TTGGGGGAGATAAGCCTGTTATCCCCAGGGTAGCTTTTATCCGTTGAGCGATGGCCCTTC 
TTGGGGGAGATAAGCCTGTTATCCCCAGGGTAGCTTTTATCCGTTGAGCGATGGCCCTTC 
TTGGGGGAGATAAGCCTGTTATCCCCAGGGTAGCTTTTATCCGTTGAGCGATGGCCCTTC 
************************************************************ 



60 



65 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



12023 
12024 
12019 



CATACGGAkCCACCGGATCACTAAGCCCGACTTTCGTCCCTGCTCGAGTTGTAGCTCTCG 
CATACGGAACCACCGGATCACTAAGCCCGACTTTCGTCCCTGCTCGAGTTGTAGCTCTCG 
CATACGGAACCACCGGATCACTAAGCCCGACTTTCGTCCCTGCTCGAGTTGTAGCTCTCG 
CATACGGAACCACCGGATCACTAAGCCCGACTTTCGTCCCTGCTCGAGTTGTAGCTCTCG 
CATACGGAA.CCACCGGATCACTAAGCCCGACTTTCGTCCCTGCTCGAGTTGTAGCTCTCG 
CATACGGAA.CCACCGGATCACTAAGCCCGACTTTCGTCCCTGCTCGAGTTGTAGCTCTCG 
CATACGGAACCA.CCGGATCACTA&GCCCGACTTTCGTCCCTGCTCGAGTTGTAGCTCTCG 
************************************************************ 

CAGTCARGCTCCCTTATACCTTTACACTCTACGACTGATTTCCAACCAGTCTGAGGGAAC 
CAGTCAAGCTCCCTTATACCTTTACACTCTACGACTGATTTCCARCCAGTCTGAGGGARC 
CAGTCARGCTCCCTTATACCTTTACACTCTACX3ACTGATTTCCAACCAGTCTGAGGGAAC 
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-2959- 



12021 
12020 
12018 
12022 



C^GTCAAGCTCCCTTATACCTTTACaCTCTACGACTGATTTCCAACCAGTCTGAGGGaAC 
CAGTCAAGCTCCCTTATACCTTTACACTCTACGACTGATTTCCAACCAGTCTGAGGGAAC 
CAGTCAAGCTCCCTTATACCTTTACACTCTACGACTGATTTCCAACCAGTCTGAGGGAAC 
CAGTCAAGCTCCCTTATACCTTTACACTCTACGACTGATTTCCAACCAGTCTGAGGGAAC 
************************************************************ 
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12023 
12024 
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12022 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



CTTTGGGCGCCTCCGTTACCTTTTAGGAGGCGACCGCCCCAGTCAAACTGCCCGTCAGAC 
CTTTGGGCGCCTCCGTTACCTTTTAGGAGGCGACCGCCCCAGTCAAACTGCCCGTCAGAC 
CTTTGGGCGCCTCCGTTACCTTTTAGGAGGCGACCGCCCCAGTCAAACTGCCCGTCAGAC 
CTTTGGGCGCCTCCGTTACCTTTTAGGAGGCGACCGCCCCAGTCAAACTGCCCGTCAGAC 
CTTTGGGCGCCTCCGTTACCTTTTAGGAGGCGACCGCCCCAGTCAAACTGCCCGTCAGAC 
CTTTGGGCGCCTCCGTTACCTTTTAGGAGGGGACCGCCCCAGTCAAACTGCCCGTCAGAC 
CTTTGGGCGCCTCCGTTACCTTTTAGGAGGCGACCGCCCCAGTCAAACTGCCCGTCAGAC 
************************************************************ 

ACTGTCTCCGATAGGGATTGCCTATCTGGGTTAGAGTAGCCATAACACAAGGGTAGTATC 
ACTGTCTCCGATAGGGATTGCCTATCTGGGTTAGAGTAGCCATAACACAAGGGTAGTATC 
ACTGTCTCCGATAGGGATTGCCTATCTGGGTTAGAGTAGCCATAACACAAGGGTAGTATC 
ACTGTCTCCGATAGGGATTGCCTATCTGGGTTAGAGTAGCCATAACACAAGGGTAGTATC 
ACTGTCTCCGATAGGGATTGCCTATCTGGGTTAGAGTAGCCATAACACAAGGGTAGTATC 
ACTGTCTCCGATAGGGATTGCCTATCTGGGTTAGAGTAGCCATAACACAAGGGTAGTATC 
ACTGTCTCCGATAGGGATTGCCTATCTGGGTTAGAGTAGCCATAACACAAGGGTAGTATC 
************************************************************ 



25 



30 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



CCAACAACGCCTCAAACGAAACTGGCGTCCCGTTATCATAGGCTCCTACCTATCCTGTAC 
CCAACAACGCCTCAAACGAAACTGGCGTCCCGTTATCATAGGCTCCTACCTATCCTGTAC 
CCAACAACGCCTCAAACGAAACTGGCGTCCCGTTATCATAGGCTCCTACCTATCCTGTAC 
CCAACAACGCCTCAAACGAAACTGGCGTCCCGTTATCATAGGCTCCTACCTATCCTGTAC 
CCAACAACGCCTCAAACGAAACTGGCGTCCCGTTATCATAGGCTCCTACCTATCCTGTAC 
CCAACAACGCCTCAAACGAAACTGGCGTCCCGTTATCATAGGCTCCTACCTATCCTGTAC 
CC^C^CGCCTO^a3AAACI^a3T^^ 

************************************************************ 



35 



40 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



ATGTGGTACAGATACTCAATATCAAACTGCAGTAAAGCTCCATGGGGTCTTTCCGTCCTG 
ATGTGGTACAGATACTCAATATCAAACTGCAGTAAAGCTCCATGGGGTCTTTCCGTCCTG 
ATGTGGTACAGATACTCAATATCSiAACTGCAGTAAAGCTCCATGGGGTCTTTCCGTCCTG 
ATGTGGTACAGATACTCAATATCAAACTGCAGTAAAGCTCCATGGGGTCTTTCCGTCCTG 
ATGTGGTACAGATACTCAATATCAAACTGCAGTAAAGCTCCATGGGGTCTTTCCGTCCTG 
ATGTGGTACAGATACTCAATATCAAACTGCAGTAAAGCTCCATGGGGTCTTTCCGTCCTG 
ATGTGGTACAGATACTCAATATCAAACTGCAGTAAAGCTCCATGGGGTCTTTCCGTCCTG 
************************************************************ 
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12023 
12024 
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12021 
12020 
12018 
12022 



TCGCGGGTAACCTGGATCTTCACAGGTACTAAAATTTCACCGAGTCTCTCGTTGAGACAG 
TCGCGGGTAACCTGCATCTTCACAGGTACTAAAATTTCACCGAGTCTCTCGTTGAGACAG 
TCGCGGGTAACCTGCATCTTCACAGGTACTAAAATTTCACCGAGTCTCTCGTTGAGACAG 
TCGCGGGTAACCTGCATCTTCACAGGTACTAAAATTTCACCGAGTCTCTCGTTGAGACAG 
TCGCGGGTAACCTGCATCTTCACAGGTACTAAAATTTCACCGAGTCTCTCGTTGAGACAG 
TCGCGGGTAACCTGCATCTTCACAGGTACTAAAATTTCACCGAGTCTCTCGTTGAGACAG 
TCGCGGGTAACCTGCATCTTCACAGGTACTAAAATTTCACCGAGTCTCTCGTTGAGACAG 
************************************************************ 
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65 
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12024 
12019 
12021 
12020 
12018 
12022 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



TGCCCAAATCATTACGCCTTTCGTGCGGGTCGGAACTTACCCGACAAGGAATTTCGCTAC 
TGCCCAAATCATTACGCCTTTCGTGCGGGTCGGAACTTACCCGACAAGGAATTTCGCTAC 
TGCCCAAATCATTACGCCTTTCGTGCGGGTCGGAACTTACCCGACAAGGAATTTCGCTAC 
TGCCCAAATCATTACGCCTTTCGTGCGGGTCGGAACTTACCCGACAAGGAATTTCGCTAC 
TGCCCAAATCATTACGCCTTTCGTGCGGGTCGGAACTTACCCGACAAGGAATTTCGCTAC 
TGCCCAAATCATTACGCCTTTCGTGCGGGTCGGAACTTACCCGACAAGGAATTTCGCTAC 
TGCCCAAATCATTACGCCTTTCGTGCGGGTCGGAACTTACCCGACAAGGAATTTCGCTAC 
************************************************************ 

CTTAGGACCGTTATAGTTACGGCCGCCGTTTACTGGGGCTTCAATTCATACCTTCGCTTA 
CTTAGGACCGTTATAGTTACGGCCX3CCGTTTACTGGGGCTTCAATTCATACCTTCGCTTA 
CTTAGGACCGTTATAGTTACGGCraCCGTTTACTGGGGCTTCAATTCATACCTTCGCTTA 
CTTAGGACCGTTATAGTTAOGGCCECCGTTTACTGGGGCTTCAATTCATACCTTCGCTTA 
CTTAGGACCGTTATAGTTACX3GCCGCCGTTTACTGGGGCTTCAATTCATACCTTCGCTTA 
CTTAGGACCGTTATAGTTACGGCCGCCGTTTACTGGGGCTTCAATTCATACCTTCGCTTA 
CTTAGGACCGTTATAGTTACGGCCGCCGTTTACTGGGGCTTCAATTCATACCTTCGCTTA 
************************************************************ 
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12024 
12019 
12021 
12020 
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12022 



CGCTAAGCACTCCTCTTAACCTTCCAGCACCGGGCAGGCGTCACCCCCTATACaTCATCT 
CGCTAAGCACTCCTCTTAACCTTCCAGCACCGGGCAGGCGTCACCCCCTATACATCATCT 
CGCTAaGCACTCCTCTTAACCTTCCAGCaCCGGGCAGGCGTCACCCCCTATACATCATCr 
CGCTAAGCACTCCTCTTAACCTTCCAGCACCGGGCAGGCGTCACCCCCTATACATCATCT 
CGCTAAGCACTCCTCTTAACCTTCCaGCACCGGGCAGGCGTCACCCCCTATACATCATCT 
CGCTAAGCACTCCTCTTAACCTTCCAGCACCGGGCAGGCGTCACCCCCTATACATCATCT 
CGCTAAGCACTCCTCTTAAGCTTCCAGCACCGGGCAGGCGTCACCCCCTATACATCATCT 
************************************************************ 

TACGATTTAGCAGAGAGCTGTGTTTTTGATAAA.CAGTTGCTTGGGCCTATTCACTGCGGC 
TACGATTTAGCAGAGAGCTGTGTTTTTGATAAA.CAGTTGCTTGGGCCTATTCACTGCGGC 
TACGATTTAGCAGAGAGCTGTGTTTTTGATAAACAGTTGCTTGGGCCTATTCACTGCGGC 



TACGATTTAGCAGAGAGCTGTGTTTTTGATAAA.CAGTTGCTTGGGCCTATTCACTGCGGC 
TACGATTTAGCAGAGAGCTGTGTTTTTGATAAACAGTTGCTTGGGCCTATTCACTGCGGC 
TACGATTTAGCAGAGAGCTGTGTTTTTGATAARCAGTTGCTTGGGCCTATTCACTGCGGC 
************************************************************ 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



TGATCTAAAATCAGCGCCCCTTCTCCCGAAGTTACGGGGCCATTTTGCCGAGTTCCTTAA 
TGATCTAAAATCAGCGCCCCTTCTCCCGAAGTTACGGGGCCATTTTGCCGAGTTCCTTAA 
TGATCTAAaATCAGCGCCCCTTCTCCCGAAGTTACGGGGCCATTTTGCCGAGTTCCTTAA 
TGATCTAAAATCAGCGCCCCTTCTCCCGAAGTTACGGGGCCATTTTGCCGAGTTCCTTAA 
TGATCTAAAATCAGCGCCCCTTCTCCCGAAGTTACGGGGCCATTTTGCCGAGTTCCTTAA 
TGATCTAAAATCAGCGCCCCTTCTCCCGAAGTTACGGGGCCATTTTGCCGAGTTCCTTAA. 
TGATCTAAAATCAGCGCCCCTTCTCCCGAAGTTACGGGGCCATTTTGCCGAGTTCCTTAA 
************************************************************ 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



CGAGAGTTCTCTCGCTCACCTGAGGCTACTCGCCTCGACTACCTGTGTCGGTTTGCGGTA 
CGAGAGTTCTCTCGCTCACCTGAGGCTACTCGCCTCGACTACCTGTGTCGGTTTGCGGTA 
CGAGAGTTCTCTCGCTCACCTGAGGCTACTCGCCTCGACTACCTGTGTCGGTTTGCGGTA 
CGAGAGTTCTCTCGCTCACCTG3iGGCTACTCSCCTQ=ACTACCTGTGTCGGTTTGCGGTA 
CGAGAGTTCTCTCGCTCACMK33VGGCTACTCGCCTCGACTACCTGTGTCGGTTTGCGGTA 
CGAGAGTTCTCTCGCTCACCTGAGGCTACTCGCCTCGACTACCTGTGTCGGTTTGCGGTA 
GGaGAGTTCTCTCGCTCIACCTGSGGCTACTGGCCTCGACTACCTGTGTCGGTTTGCGGTA 
******************* **************************************** 
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CGGGTAGAGTATATGTATCGCTAGAAGCTTTTCTTGGCAGTGTGACATCACTAACTTCGC 
CGGGTAGAGTATATGTATCGCTAGAAGCTTTTCTTGGCAGTGTGACATCACTAACTTCGC 
CGGGTAGAGTATATGTATCGCTAGAAGCTTTTCTTGGCAGTGTGACATCACTAACTTCGC 
CGGGTAGAGTATATGTATCGCTAGAAGCTTTTCTTGGCAGTGTGACATCACTAACTTCGC 
CGGGTAGAGTATATGTATCGCTAGAAGCTTTTCTTGGCAGTGTGACATCACTAACTTCGC 
CGGGTAGAGTATATGTATCGCTAGAAGCTTTTCTTGGCAGTGTGACATCACTAACTTCGC 
CGGGTAGAGTATATGTATCGCTAGAAGCTTTTCTTGGCAGTGTGACATCACTAACTTCGC 
************************************************************ 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



TACTAAACTTCGCTCCTCGTCACAGCTCAATGTTAARGATATAAGCATTTGACTCATATC 
TACTAAACTTCGCTCCTCGTCACAGCTCAATGTTAAAGATATAAGCATTTGACTCATATC 
TACTAAACTTCGCTCCTCGTCACAGCTCAATGTTAARGATATAAGCATTTGACTCATATC 
TACTAAACTTCGCTCCTCGTCACAGCTCAATGTTAAAGATATAAGCATTTGACTCATATC 
TACTAAACTTCGCTCCTCGTCACAGCTCAATGTTAAAGATATAAGCATTTGACTCATATC 
TACTAAACTTCGCTCCTCGTCACAGCTCAATGTTAAAGATATAAGCATTTGACTCATATC 
TACTAAACTTCGCTCCTCGTCACAGCTCAATGTTAAAGATATAAGCATTTGACTCATATC 
************************************************************ 

ACACCTCACTGTTTGACCAGACACTTCCAATCGTCTGGTTTAGTTAGCCTACTGCGTCCC 
ACaCCTCACTGTTTGACCaGACACTTCCAATCGTCTGGTTTAGTTAGCCTACTGCGTCCC 
ACACCTCACTGTTTGACCAGACACTTCCAATCGTCTGGTTTAGTTAGCCTACTGCGTCCC 
ACACCTCACTGTTTGACCAGACACTTCCAATCGTCTGGTTTAGTTAGCCTACTGCGTCCC 
ACACCTCACTGTTTGACCAGACACTTCCAATCGTCTGGTTTAGTTAGCCTACTGCGTCCC 
ACACCTCACTGTTTGACCAGACACTTCCAATCGTCTGGTTTAGTTAGCCTACTGCGTCCC 
ACACCTCACTGTTTGACCAGACACTTCCAATCGTCTGGTTTAGTTAGCCTACTGCGTCCC 
************************************************************ 
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12023 
12024 
12019 
12021 



TCCATCACTATATACTCTAGTACAGGAATATCAACCTGTTGTCCATCGGATACACCTTTC 
TCCATCACTATATACTCTAGTACAGGAATATCAACCTGTTGTCCATCGGATACACCTTTC 
TCCATCACTATATACTCTAGTACAGGAATATCAACCTGTTGTCCATCGGATACACCTTTC 
TCCATCACTATATACTCTAGTACAGGAATATCAACCTGTTGTCCATCGGATACACCTTTC 
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12020 
12018 
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TCCATCACTATATACTCTAGTACAGGAATATCAACCTGTTGTCCATCGGATACACCTTTC 
TCCATCACTATATACTCTAGTACaGGAATATCAACCTGTTGTCCATCGGATACACCTTTC 
TCCATCACTATATACTCTAGTACAGGAATATCAACCTGTTGTCCATCGGATACACCTTTC 
************************************************************ 
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GGTCTCTCCTTAGGTCCCGACTAACCCAGGGCGGACGAGCCTTCCCCTGGAAACCTTAGT 
GGTCTCTCCTTAGGTCCCGACTAACCCAGGGCGGACGAGCCTTCCCCTGGAAACCTTAGT 
GGTCTCTCCTTAGGTCCCGACTAACCCAGGGCGGACGAGCCTTCCCCTGGAAACCTTAGT 
GGTCTCTCCTTAGGTCCCGACTAACCCAGGGCGGACGAGCCTTCCCCTGGAARCCTTAGT 
GGTCTCTCCTTAGGTCCCGACTAACCCAGGGCGGACGAGCCTTCCCCTGGARACCTTAGT 
GGTCTCTCCTTAGGTCCCGACTAACCCAGGGCGGACGAGCCTTCCCCTGGAAACCTTAGT 
GGTCTCTCCTTAGGTCCCGACTAACCCAGGGCGGACGAGCCTTCCCCTGGAAACCTTAGT 
************************************************************ 
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20 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



CTTACGGTGGACAGGATTCTCACCTGTCTTGCGCTACTCATACCGGCATTCTCACTTCTA 
CTTACGGTGGACAGGATTCTCACCTGTCTTGCGCTACTCATACCGGCATTCTCACTTCTA 
CTTACGGTGGACAGGATTCTCACCTGTCTTGCGCTACTCATACCGGCATTCTCACTTCTA 
CTTACGGTGGACAGGATTCTCACCTGTCTTGCGCTACTCATACCGGCATTCTCACTTCTA 
CTTACGGTGGACAGGATTCTCACCTGTCTTGCGCTACTCATACCGGCATTCTCACTTCTA 
CTTACGGTGGACAGGATTCTCACCTGTCTTGCGCTACTCATACCGGCATTCTCACTTCTA 
CTTACGGTGGACAGGATTCTCACCTGTCTTGCGCTACTCATACCGGCATTCTCACTTCTA 
************************************************************ 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



TGCGTTCCAGCGCTCCTCACGGTACACCTTCTTCACACATAGAACGCTCTCCTACCATGA 
TGCGTTCCAGCGCTCCTCACGGTACACCTTCTTCACACATAGAA.CGCTCTCCTACCATGA 
TGCGTTCCAGCGCTCCTCACGGTACACCTTCTTCACACATAGAA.CGCTCTCCTACCATGA 
TGCGTTCCAGCGCTCCTCACGGTACACCTTCTTCACACATAGAACGCTCTCCTACCATGA 
TGCGTTCCAGCGCTCCTCACGGTACACCTTCTTCACACATAGAACGCTCTCCTACCATGA 
TGCGTTCCAGCGCTCCTCACGGTACACCTTCTTCACACATAGAACGCTCTCCTACCATGA 
TGCGTTCCAGCGCTCCTCACGGTACACCTTCrrCACACATAGAACGCTCTCCTACCATGA 
************************************************************ 
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40 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



CACTTTTGTGTGATCCACMCTTCGGTAATATGTTTTAGCCCCGGTACATTTTCGGCGCA 
CACTTTTGTGTCATCCACAGCTTCGGTAATATGTTTTAGCCCCGGTACATTTTCGGCGCA 
CACTTTTGTGTCATCCACAGCTTCGGTAATATGTTTTAGCCCCGGTACATTTTC33GCGCA 
CACTTTTGTGTCATCCACAGCTTCGGTAATATGTTTTAGCCCCGGTACATTTTCGGCGCA 
CACTTTTGTGTCATCCACAGCTTCGGTAATATGTTTTAGCCCCGGTACATTTTCGGCGCA 
CACTTTTGTGTCATCCACAGCTTCGGTAATATGTTTTAGCCCCGGTACATTTTCGGCGCA 
CACTTTTGTGTCATCCACAGCTTCGGTAATATGTTTTAGCCCCGGTACATTTTCGGCGCA 
************************************************************ 
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50 



55 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



GGGTCACTCGACTAGTGAGCTATTACGCACTCTTTGAATGAATAGCTGCTTCTAAGCTAR. 
GGGTCACTCGACTAGTGAGCTATTACGCACTCTTTGAATGAATAGCTGCTTCTAAGCTAA 
GGGTCACTCGACTAGTGAGCTATTACGCACTCTTTGftATGAATAGCTGCTTCTAAGCTAA 
GGGTCACTCGACTAGTGAGCTATTACGCACTCTTTGAATGAATAGCTGCTTCTAAGCTAA 
GGGTCACTCGACTAGTGAGCTATTACGCACTCTTTGAATGAATAGCTGCTTCTAAGCTAA 
GGGTCACTCGACTAGTGAGCTATTACGCACTCTTTGAATGAATAGCTGCTTCTAAGCTAA 
GGGTCACTCGACTAGTGAGCTATTACGCACTCTTTGAATGAATAGCTGCTTCTAAGCTAA 
************************************************************ 

CATCCTAGTTGTCTGTGCAACCCCACATCCTTTTCCACTTAACATATATTTTGGGACCTT 
CATCCTAGTTGTCTGTGCAACCCCACATCCTTTTCCACTTAACATATATTTTGGGACCTT 
CATCCTAGTTGTCTGTGCAACCCCACATCCTTTTCCACTTAACATATATTTTGGGACCTT 
CATCCTAGTTGTCTGTGCAACCCCACATCCTTTTCCACTTAACATATATTTTGGGACCTT 
CATCCTAGTTGTCTGTGCAACCCCACATCCTTTTCCACTTAACATATATTTTGGGACCTT 
CATCCTAGTTGTCTGTGCAACCCCACATCCTTTTCCACTTAACATATATTTTGGGACCTT 
CATCCTAGTTGTCTGTGCAACCCCACATCCTTTTCCACTTAACATATATTTTGGGACCTT 
************************************************************ 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



AGCTGGTGGTCTGGGCTGTTTCCCTTTCGACTACGGATCTTAGCACTCGCAGTCTGACTG 
AGCTGGTGGTCTGGGCTGTTTCCCTTTCGACTACGGATCTTAGCACTCGCAGTCTGACTG 
AGCTGGTGGTCTGGGCTGTTTCCCTTTCGACTACGGATCTTAGCACTCGCAGTCTGACrG 
AGCTGGTGGTCTGGGCTGTTTCCCTTTCGACTACGGATCTTAGCACTCGCAGTCTGACTG 
AGCTGGTGGTCTGGGCTGTTTCCCTTTCGACTACGGATCTTAGCACTCGCAGTCTGACTG 
AGCTGGTGGTCTGGGCTGTTTCCCTTTCGACTACGGATCTTAGCACTCGCAGTCTGACTG 
AGCTGGTGGTCTGGGCTGTTTCCCTTTCGACTACGGATCTTAGCACTCGCAGTCTGACTG 
************************************************************ 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



CCGATTATATCTCGTTGGCATTCGGAGTTTATCTGAGATTGGTAATCCGGGATGGACCCC 
CCGATTATATCTCGTTGGCATTCX3GAGTTTATCTGAGATTGGTAATCCGGGATGGACCCC 
CCGATTATATCTCGTTGGCATTCGGAGTTTATCTGAGATTGGTAATCCGGGATGGACCCC 
CCGATTATATCTCGTTGGCATTCGGAGTTTATCTGAGATTGGTAATCCGGGATGGACCCC 
CCGATTATATCTCGTTGGCATTCGGAGTTTATCTGAGATTGGTAATCCGGGATGGACCCC 
CCGATTATATCTCGTTGGCATTCGGAGTTTATCTGAGATTGGTAATCCGGGATGGACCCC 
CCGATTATATCTCGTTGGCATTCGGAGTTTATCTGAGATTGGTAATCCGGGATGGACCCC 
************************************************************ 
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15 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



TCACCCAAACAGTGCTCTACCTCCAAGAGACTTAACATCGACGCTAGCCCTAAAGCTATT 
TCACCCAAACAGTGCTCTACCTCCAAGAGACTTAACATCGACGCTAGCCCTAAAGCTATT 
TCACCCAAACAGTGCTCTACCTCCAAGAGACTTAACATCGACGCTAGCCCTAAAGCTATT 
TCACCCAAACAGTGCTCTACCTCCAAGAGACTTAACATCGACGCTAGCCCTAAAGCTATT 
TCACCCAAACAGTGCTCTACCTCCAAGAGACTTAACATCGACGCTAGCCCTAAAGCTATT 
TCACCCAAACAGTGCTCTACCTCCAAGAGACTTAACATCGACGCTAGCCCTAAAGCTATT 
TCACCCAAACAGTGCTCTACCTCCAAGAGACTTAACATCGACGCTAGCCCTAAAGCTATT 
************************************************************ 
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25 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



TCGGAGAGAACCAGCTATCTCCAAGTTCGTTTGGAATTTCTCCGCTACCCACAAGTCATC 
TCGGAGAGAACCAGCTATCTCCAAGTTCGTTTGGAATTTCTCCGCTACCCACAAGTCATC 
TCGGAGAGAACCAGCTATCTCCAAGTTCGTTTGGAATTTCTCCGCTACCCACAAGTCATC 
TCGGAGAGAACCAGCTATCTCCAAGTTCGTTTGGAATTTCTCCGCTACCCA.CAAGTCATC 
TCGGAGAGAACCAGCTATCTCCAAGTTCGTTTGGAATTTCTCCGCTACCCACAAGTCATC 
TCGGAGAGAACCAGCTATCTCCAaGTTCGTTTGGAATTTCTCCGCTACCCACAAGTCATC 
TCGGAGAGAACCAGCTATCTCCAAGTTCGTTTGGAATTTCTCCGCTACCCACAAGTCATC 
************************************************************ 
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35 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



CAAGCACTTTTCAACGTGCCCTGGTTCGGTCCTCCAGTGAGTTTTACCTCACCTTCAACC 
CAAGCACTTTTCAACGTGCCCTGGTTCGGTCCTCCAGTGAGTTTTACCTCACCTTCAACC 
CAAGCACTTTTCAACGTGCCCTGGTTCGGTCCTCCAGTGAGTTTTACCTCACCTTCAACC 
CftAGCACTTTTCftACGTGCCCTGGTTCGGTCCTCCAGTG^TTTTACCTCACCTTCftACC 
CAAGCACTTTTCAACGTGCCCTGGTTCGGTCCTCCAGTGAGTTTTACCTCACCTTCAACC 
CAAGCACTTTTCAACGTGCCCTGGTTCGGTCCTCCAGTGAGTTTTACCTCACCTTCAACC 
CAAGCACTTTTCAACX3TGCCCTGGTTCGGTCCTCCAGTGAGTTTTACCTCACCTTCAACC 
************************************************************ 
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12024 
12019 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



TGCTCATGGGTAGGTCACATGGTTTCGGGTCTACAACATGATACTATGACGCCCTATTAA 
TGCTCATGGGTAGGTCACATGGTTTCGGGTCTACAA.CATGATACTATGACGCCCTATTAA 
TGCTCATGGGTAGGTCACATGGTTTCGGGTCTACARCATGATACTATGACGCCCTATTAA 
TGCTCATGGGTAGGTCACATGGTTTCGGGTCTACAACATGATACTATGACGCCCTATTAA 
TGCTCATGGGTAGGTCACATGGTTTCGGGTCTACAACATGATACTATGACGCCCTATTAA 
TGCTCATGGGTAGGTCACATGGTTTCGGGTCTACAACATGATACTATGACGCCCTATTAA 
TGCTCATGGGTAGGTCACATGGTTTCGGGTCTACAACATGATACTATGACGCCCTATTAA 
************************************************************ 

GACTCGGTTTCCCTACGGCTCCGTCTCTTCAACTTAACCTCGCATCATATCGTAACTCGC 
GACTCGGTTTCCCTACGGCTCCGTCTCTTCAACTTAACCTCGCATCATATCGTAACTCGC 
GACTCGGTTTCCCTACGGCTCCGTCTCTTCAACTTAACCTCGCATCATATCGTAACTCGC 
GACTCGGTTTCCCTACGGCTCCGTCTCTTCAACTTAACCTCGCATCATATCGTAACTCGC 
GACTCGGTTTCCCTACGGCTCCGTCTCTTCAACTTAACCTCGCATCATATCGTAACTCGC 
GACTCGGTTTCCCTACGGCTCCGTCTCTTCAACTTAACCTCGCATCATATCGTAACTCGC 
GACTCGGTTTCCCTACGGCTCCGTCTCTTCAACTTAA.CCTCGCATCATATCGTAACTCGC 
************************************************************ 



55 



60 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



CGGTTCATTCTACAAAAGGCACGCTCTCACCCATTAACGGGCTCGAACTTGTTGTAGGCA 
CGGTTCATTCTACAAAAGGCACGCTCTCACCCATTAR.CGGGCTCGAACTTGTTGTAGGCA 
CGGTTCATTCTACAARAGGCACGCTCTCACCCATTAACGGGCTCGAACTTGTTGTAGGCA 
CGGTTCATTCTACAAAAGGCACGCTCTCACCCATTAACGGGCTCGAACTTGTTGTAGGCA 
CGGTTCATTCTACAAAAGGCACGCTCTCACCCATTAACGGGCTCGAACTTGTTGTAGGCA 
CGGTTCATTCTACAAAAGGCACGCTCTCACCCATTAACGGGCTCGAACTTGTTGTAGGCA 
CGGTTCATTCTACAAAAGGCACGCTCTCACCCATTAACGGGCTCGAACTTGTTGTAGGCA 
************************************************************ 



65 



12023 
12024 
12019 
12021 
12020 



CACGGTTTCAGGTTCTATTTCACTCCCCTCCCGGGGTGCTTTTCACCTTTCCCTCACGGT 
CACGGTTTCAGGTTCTATTTCACTCCCCTCCCGGGGTGCTTTTCACCTTTCCCTCACGGT 
CACGGTTTCAGGTTCTATTTCACTCCCCTCCC5GGGGTGCTTTTCACCTTTCCCTCACGGT 
CACGGTTTCAGGTTCTATTTCACTCCCCTCCCGGGGTGCTTTTCACCTTTCCCTCACGGT 
CACGGTTTCAGGTTCTATTTCACTCCCCTCCCX3GGGTGCTTTTCACCTTTCCCTCACGGT 
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-2963- 



12018 
12022 



CACGGTTTCAGGTTCTATTTCACTCCCCTCCCGGGGTGCTTTTCACCTTTCCCTCACGGT 
CACGGTTTCAGGTTCTATTTCACTCCCCTCCCGGGGTGCTTTTCACCTTTCCCTCACGGT 
************************************************************ 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



ACTGGTTCACTATCGGTCACTAGAGAGTATTTAGGGTTGGGAGATGGTCCTCCCAGATTC 
ACTGGTTCACTATCGGTCACTAGAGAGTATTTAGGGTTGGGAGATGGTCCTCCCAGATTC 
ACTGGTTCACTATCGGTCACTAGAGAGTATTTAGGGTTGGGAGATGGTCCTCCCAGATTC 
ACTGGTTCACTATCGGTCACTAGAGAGTATTTAGGGTTGGGAGATGGTCCTCCCAGATTC 
ACTGGTTCACTATCGGTCACTAGAGAGTATTTAGGGTTGGGAGATGGTCCTCCCAGATTC 
ACTGGTTCACTATCGGTCACTAGAGAGTATTTAGGGTTGGGAGATGGTCCTCCCAGATTC 
ACTGGTTCACTATCGGTCACTAGAGAGTATTTAGGGTTGGGAGATGGTCCTCCCAGATTC 
************************************************************ 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



CGACGAGATTTCGCGTGTCTCGCCGTACTCAGGATACTGCTAAGGTTAATCTATCATTTT 
CGACGAGATTTCGCGTGTCTCGCCGTACTCAGGATACTGCTAAGGTTAATCTATCATTTT 
CGACGAGATTTCGCGTGTCTCGCCGTACTCAGGATACTGCTAAGGTTAATCTATCATTTT 
CGACGAGATTTCGCGTGTCTCGCCGTACTCAGGATACTGCTAAGGTTAATCTATCATTTT 
CGACGAGATTTCGCGTGTCTCGCCGTACTCAGGATACTGCTAAGGTTAATCTATCATTTT 
CGACGAGATTTCGCGTGTCTCGCCGTACTCAGGATACTGCTAAGGTTAATCTATCATTTT 
CGACGAGATTTCGCGTGTCTCGCCGTACTCAGGATACTGCTAAGGTTA&TCTATCATTTT 
************************************************************ 
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30 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



AAATACGAGGCTGTTACTCTCTTTGGCTTACCTTCCCAGGTAATTCTTCTATAATGATTA 
AAATACGAGGCTGTTACTCTCTTTGGCTTACCTTCCCAGGTAATTCTTCTATAATGATTA 
AAATACGAGGCTGTTACTCTCTTTGGCTTACCTTCCCAGGTAATTCTTCTATAATGATTA 
AAATACGAGGCTGTTACTCTCTTTGGCTTACCTTCCCAGGTAATTCTTCTATAATGATTA 
AAATACGAGGCTGTTACTCTCTTTGGCTTACCTTCCCAGGTAATTCTTCTATAATGATTA 
AAATACGAGGCTGTTACTCTCTTTGGCTTACCTTCCCAGGTAATTCTTCTATAATGATTA 
AAATACGAGGCTGTTACTCTCTTTGGCJrTACCTTCCCAGGTAATTCTTCTATAATGATTA 
************************************************************ 
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12023 
12024 
12019 
12021 
12020 
12018 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



ATCCTATATCGCAGTCCTACAACCCCGAAGTGTARACACTTCGGTTTGCCCTCCTGCCGT 
ATCCTATATCGCAGTCCTACRACCCCGAAGTGTAAACACTTCGGTTTGCCCTCCTGCCGT 
ATCCTATATCGCAGTCCTAC3\ACCCCGAAGTGTAAACACTTCGGTTTGCCCTCCTGCCGT 
ATCCTATATCGra.GTCCTACAACCCCGAAGTGTAAACACTTCGGTTTGCCCTCCTGCCGT 
ATCCTATATCGCAGTCCTACAACCCCX3RAGTGTAAACACTTCGGTTTGCCCTCCTGCCGT 
ATCCTATATCGCAGTCCTACAACCCCGAAGTGTAAACACTTCGGTTTGCCCTCCTGCCGT 
ATCCTATATCGCAGTCCTACAACCCCGAAGTGTAAACACTTCGGTTTGCCCTCCTGCCGT 
************************************************************ 

TTCGCTCGCCGCTACTAAGGCAATCGCTTTTGCTTTCTCTTCCTGCAGCTACTTAGATGT 
TTCGCTCGCCGCTACTAAGGCAATCGCTTTTGCTTTCTCTTCCTGCAGCTACTTAGATGT 
TTCGCTCGCCGCTACTAAGGCAATCGCTTTTGCTTTCTCTTCCTGCAGCTACTTAGATGT 
TTCGCTCGCCGCTACTAAGGCAATCGCTTTTGCTTTCTCTTCCTGCAGCTACTTAGATGT 
TTCGCTCGCCGCTACTAAGGCAATCGCTTTTGCTTTCTCTTCCTGCAGCTACTTAGATGT 
TTCGCTCGCCGCTACTAAGGCAATCGCTTTTGCTTTCTCTTCCTGCAGCTACTTAGATGT 
TTCGCTCGCCGCTACTAAGGCAATCGCTTTTGCTTTCTCTTCCTGCAGCTACTTAGATGT 
************************************************************ 
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55 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



TTCAGTTCACTGCGTCTTCCTTCTCATATCCTTAACAGATATGGATACTAGTCATTAACT 
TTCAGTTCACTGCGTCTTCCTTCTCATATCCTTAACAGATATGGATACTAGTCATTAACT 
TTCAGTTCACTGCGTCTTCCTTCTCATATCCTTAACAGATATGGATACTAGTCATTAACT 
TTCAGTTCACTGCGTCTTCCTTCTCATATCCTTAACAGATATGGATACTAGTCATTAACT 
TTCAGTTCACTGCGTCTTCCTTCTCATATCCTTAACAGATATGGATACTAGTCATTAACT 
TTCAGTTCACTGCGTCTTCCTTCTCATATCCTTAACAGATATGGATACTAGTCATTAACT 
TTCAGTTCACTGCGTCTTCCTTCTCATATCCTTAACAGATATGGATACTAGTCATTAACT 
************************************************************ 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



AGTGGGTTCCCCCATTCGGACATCTCTGGATCAGCGCTTACTTACAGCTCCCCAAAGCAT 
AGTGGGTTCCCCCATTCGGACATCTCTGGATCAGCGCTTACTTACAGCTCCCCAAAGCAT 
AGTGGGTTCCCCCATTCGGACIATCTCTGGATCAGCGCTTACTTACAGCTCCCCAAAGCAT 
AGTGGGTTCCCCCATTCGGACATCTCTGGATCAGCGCTTACTTACAGCTCCCCAAAGCAT 
AGTGGGTTCCCCCATTCGGACATCTCTGGATCAGCGCTTACTTACAGCTCCCCAAAGCAT 
AGTGGGTTCCCCCATTCGGACATCTCTGGATCAGCGCTTACTTACAGCTCCCCAAAGCAT 
AGTGGGTTCCCCCATTCGGACATCTCTGGATCAGCGCTTACTTACAGCTCCCCAAAGCAT 
************************************************************ 



12023 



TTCGTCGTTAGTCACGTCCTTCTTCGGCTTCTAGTGCCAAGGCATCCACCGTGCGCCCTT 
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12024 
12019 
12021 
12020 
12018 
12022 



TTCGTCGTTAGTCACGTCCTTCTTCGGCTTCTAGTGCCAAGGC3VTCCACCGTGCGCCCTT 
TTCGTCGTTAGTCaCGTCCTTCTTCGGCTTCTAGTGCCAAGGCATCCACCGTGCGCCCTT 
TTCGTCGTTAGTCACGTCCTTCTTCGGCTTCTAGTGCCAAGGCATCCACCGTGCGCCCTT 
TTCGTCGTTAGTCACGTCCTTCTTCGGCTTCTAGTGCCAAGGCATCCACCGTGCGCCCTT 
TTCGTCGTTAGTCACGTCCTTCTTCGGCTTCTAGTGCCAAGGCATCCACCGTGCGCCCTT 
TTCGTCGTTAGTCACGTCCTTCTTCGGCTTCTAGTGCCAAGGCATCCACCGTGCGCCCTT 
************************************************************ 
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15 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



ATTAACTTAACCTTATTAACCTAGTTTCTTTAAAACTAGAAAACTCATTAAATATTCACA 
ATTAACTTAACCTTATTAACCTAGTTTCTTTAAAACTAGAAAACTCATTAAATATTCACA 
ATTAACTTAACCTTATTAACCTAGTTTCTTTAAAACTAGAAAACTCATTAAATATTCACA 
ATTAACTTAACCTTATTAACCTAGTTTCTTTAAAACTAGAAAACTCATTAAATATTCACA 
ATTAACTTAACCTTATTAACCTAGTTTCTTTAAAACTAGAAAACTCATTAAATATTCACA 
ATTAACTTAACCTTATTAACCTAGTTTCTTTAAAACTAGAAAACTCATTAAATATTCACA 
ATTAACTTAACCTTATTAACCTAGTTTCTTTAAAACTAGAAAACTCATTAAATATTCACA 
************************************************************ 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



GCGTTTTCGGTTTATTTTCTTGTTACTTTCTACAATCTATTTCTAGATCGTGGAATTTGA 
GCGTTTTCGGTTTATTTTCTTGTTACTTTCTACAATCTATTTCTAGATCGTGGAATTTGA 
GCGTTTTCGGTTTATTTTCTTGTTACTTTCTACAATCTATTTCTAGATCGTGGAATTTGA 
GCGTTTTCGGTTTATTTTCTTGTTACTTTCTACAATCTATTTCTAGATCGTGGAATTTGA 
GCGTTTTCGGTTTATTTTCTTGTTACTTTCTACAATCTATTTCTAGATCGTGGAATTTGA 
GCGTTTTCGGTTTATTTTCTTGTTACTTTCTACAATCTATTTCTAGATCGTGGAATTTGA 
GCGTTTTCGGTTTATTTTCTTGTTACTTTCTACAATCTATTTCTAGATCGTGGAATTTGA 
************************************************************ 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



TATAGATATTCAATTTTCAATGAACAATTTGAACCTTTCGATTCAATGGAGCCTAGCGGG 
TATAGATATTCAATTTTCAATGAACAATTTGAACCTTTCGATTCAATGGAGCCTAGCGGG 
TATAGATATTCAATTTTCAATGAACAATTTGAACCTTTCGATTCAATGGAGCCTAGCGGG 
TATAGATATTCAATTTTCAATGAACIftATTlVSAACCTTTCGATTCAATGGAGCCTAGCGGG 
TATAGATATTCAATTTTCAATGAACA&TTTGB^CCT^ 

TATAGATATTCAATTTTCIAATGAACAATTTGAACCTTTCGATTCIAATGGAGCCTAGCGGG 
TATAGATATTC^TTTTC^TGAACIAATTTGAACCTTT<^TTCAATGGAGCCTAGCGGG 
************************************************************ 

ATCGAACCGCTGACCTCCTGCGTGCAAAGCAGGCGCTCTCCCAGCTGAGCTAAGGCCCCA 
ATCGAACCGCTGACCTCCTGCGTGCAAAGCAGGCGCTCTCCCAGCTGAGCTAAGGCCCCA 
ATCGAACCGCTGACCTCCTGCGTGCAAAGCAGGCGCTCTCCCAGCTGAGCTAAGGCCCCA 
ATCGAACCGCTGACCTCCTGCGTGCAAAGCAGGCGCTCTCCCAGCTGAGCTAAGGCCCCA 
ATCGAACCGCTGACCTCCTGCGTGCAAAGCAGGCGCTCTCCCAGCTGAGCTAAGGCCCCA 
ATCGAACCGCTGACCTCCTGCGTGCAAAGCAGGCGCTCTCCCAGCTGAGCTAAGGCCCCA 
ATCGAACCGCTGACCTCCTGCGTGCAAAGCAGGCGCTCTCCCAGCTGAGCTAAGGCCCCA 
************************************************************ 



45 



50 



12023 
12024 
12019 
12021 
12020 
12018 
12022 



CAAGACCTCTCAAAACTAAACAAGACGCAAATGGCAGGTTTCCTTATCCTTAGAAAGGAG 
CAAGACCTCTCAAAACTAAACAAGACGCAAATGGCAGGTTTCCTTATCCTTAGAAAGGAG 
CAAGACCTCTCAAAACTAAACAAGACGCAAATGGCAGGTTTCCTTATCCTTAGAAAGGAG 
CAAGACCTCTCAAAACTAAACARGACGCAAATGGCAGGTTTCCTTATCCTTAGAAAGGAG 
CAAGACCTCTCAAAACTAAACAaGACGCAAATGGCAGGTTTCCTTATCCTTAGAAAGGAG 
CAAGACCTCTCAARACTAAACAAGACGCAAATGGCAGGTTTCCTTATCCTTAGAAAGGAG 
CAAGACCTCTCAAAACTAAACAAGACGCAAATGGCAGGTTTCCTTATCCTTAGAAAGGAG 
************************************************************ 
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12023 
12024 
12019 
12021 
12020 
12018 
12022 



GTGATCCAGCCGCACCTTCCGATACGGCTACCTTGTTACGACTTCACCCCAATCATCTAT 
GTGATCCAGCCGCACCTTCCGATACGGCTACCTTGTTACGACTTCACCCCAATCATCTAT 
GTGATCCAGCCGCACCTTCCGATACGGCTACCTTGTTACGACTTCACCCCAATCATCTAT 
GTGATCCAGCCGCACCTTCCGATACGGCTACCTTGTTACGACTTCACCCCAATCATCTAT 
GTGATCCAGCCGCACCTTCCGATACGGCTACCTTGTTACGACTTCACCCCAATCATCTAT 
GTGATCCAGCCGCACCTTCCGATACGGCTACCTTGTTACGACTTCACCCCAATCATCTAT 
GTGATCCAGCCGCACCTTCCGATACGGCTACCTTGTTACGACTTCACCCCAATCATCTAT 
************************************************************ 
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12024 
12019 
12021 
12020 
12018 



cccaccttaggcggctggctcctaaaaggttacctcaccgacttcgggtgttacaaactc 
cccaccttaggcggctggctcctaaaaggttacctcaccgacttcgggtgttacaaactc 
cccaccttaggcggctggctcctaaaaggttacctcaccgacttcgggtgttacaaactc 
cccaccttaggcggctggctcctaaaaggttacctcaccgacttcgggtgttacaaactc 
cccaccttaggcggctggctcctaaaaggttacctcaccgacttcgggtgttacaaactc 
cccaccttaggcggctggctcctaaaaggttacctcaccgacttcgggtgttacaaactc 
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CCCACCTTAGGCGGCTGGCTCCTAAaaGGTTACCTCACCGACTTCGGGTGTTACAAACTC 
************************************************************ 
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TCGTGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGCGGCGTGCTGAT 

TCGTGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGCGGCGTGCTGAT 

TCGTGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGCGGCGTGCTGAT 

TCGTGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGCGGCGTGCTGAT" 

TCGTGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGCGGCGTGCTGAT 

TCGTGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGCGGCGTGCTGAT 

TCGTGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGCGGCGTGCTGAT 
************************************************************ 
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12024 
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12022 



CCGCGATTACTAGCGATTCCGACTTCATGTAGGCGAGTTGCAGCCTACAATCCGAACTGA 

CCGCGATTACTAGCGATTCCGACTTCATGTAGGCGAGTTGCAGCCTACAATCCGAACTGA 

CCGCGATTACTAGCGATTCCGACTTCATGTAGGCGAGTTGCAGCCTACAATCCGAACTGA 

CCGCGATTACTAGCGATTCCGACTTCATGTAGGCGAGTTGCAGCCTACAATCCGAACTGA 

CCGCGATTACTAGCGATTCCGACTTCATGTAGGCGAGTTGCAGCCTACAATCCGAACTGA 

CCGCGATTACTAGCGATTCCGACTTCATGTAGGCGAGTTGCAGCCTACAATCCGAACTGA 

CCGCGATTACTAGCGATTCCGACTTCATGTAGGCGAGTTGCAGCCTACAATCCGAACTGA 
************************************************************ 
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GATTGGCTTTAAGAGATTAGCTTGCCGTCACCGGCTTGCGACTCGTTGTACCAACCATTG 

GATTGGCTTTAAGAGATTAGCTTGCCGTCACCGGCTTGCGACTCGTTGTACCAACCATTG 

GATTGGCTTTAAGAGATTAGCTTGCCGTCACCGGCTTGCGACTCGTTGTACCAACCATTG 

GATTGGCTTTAAGAGATTAGCTTGCCGTCACCGGCTTGCGACTCGTTGTACCAACCATTG 

GATTGGCTTTAAGAGATTAGCTTGCCGTCACCGGCTTGCGACTCGTTGTACCAACCATTG 

GATTGGCTTTAAGAGATTAGCTTGCCGTCACCGGCTTGCGACTCGTTGTACCAACCATTG 

GATTGGCTTTAAGAGATTAGCTTGCCGTCACCGGCTTGCGACTCGTTGTACCAACCATTG 
************************************************************ 

TAGCACGTGTGTAGCCCIftGGTCATAft£3GGGCATGATGATTTGACGTCA.TCCCCACCTTCC 

TAGCACGTGTGTAGCCCAGGTCATAAGGGGCATGATGATTTGACGTCATCCCCACCTTCC 

TAGCACGTGTGTAGCCCAGGTCATAAGGGGCATGATGATTTGACGTCATCCCCACCTTCC 

TAGCACGTGTGTAGCCC3M3GTCATAaGGGGCATGATGATTTGACGTCATCCCCACCTTCC 

TAGCACX3TGTGTAGCCCftGGTCATAftGGGGCATGATGATTTGACGTCATCCCCAC 

TAGCACGTGTGTAGCCCAGGTCATAAGGGGCATGATGATTTGACGTCATCCCCACCTTCC 

TAGCACGTGTGTAGCCCAGGTCATAAGGGGCATGATGATTTGACGTCATCCCCACCTTCC 
************************************************************ 

TCCGGTTTATTACCGGCAGTCTCGCTAGAGTGCCCAACTTAATGATGGCAACTAACAATA 

TCCGGTTTATTACCGGCAGTCTCGCTAGAGTGCCCAACTTAATGATGGCAACTAACAATA 

TCCGGTTTATTACCGGCAGTCTCGCTAGAGTGCCCAACTTAATGATGGCAACTAACAATA 

TCCGGTTTATTACCGGCAGTCTCGCTAGAGTGCCCAACTTAATGATGGCAACTAACAATA 

TCCGGTTTATTACCGGCAGTCTCGCTAGAGTGCCCAACTTAATGATGGCAACTAACAATA 

TCCGGTTTATTACCGGCAGTCTCGCTAGAGTGCCCAACTTAATGATGGCAACTAACAATA 

TCCGGTTTATTACCGGCAGTCTCGCTAGAGTGCCCAACTTAATGATGGCAACTAACAATA 
************************************************************ 
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GGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAACCA 

GGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAACCA 

GGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAACCA 

GGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAACCA 

GGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAACCA 

GGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAA.CCA 

GGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAACCA 
************************************************************ 
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TGCACCACCTGTCACTTCTGCTCCGAAGAGAAA.GCCTATCTCTAGGCCGGTCAGARGGAT 

TGCACCACCTGTCACTTCTGCTCCGAAGAGAAAGCCTATCTCTAGGCCGGTCAGAAGGAT 

TGCACCACCTGTCACTTCTGCTCCGAAGAGAAAGCCTATCTCTAGGCCGGTCAGAAGGAT 

TGCACCACCTGTCACTTCTGCTCCGAAGAGAAAGCCTATCTCTAGGCCGGTCAGAAGGAT 

TGCACCACCTGTCACTTCTGCTCCGAAGAGAAAGCCTATCTCTAGGCCGGTCAGAAGGAT 

TGCACCACCTGTCACTTCTGCTCCGAAGAGAAAGCCTATCTCTAGGCCGGTCAGAAGGAT 

TGOlCCACCTGTCACTTCTGCrCa^(mGaAftGCCTATCTCTAGGCCGGTCAGAAGGAT 
************************************************************ 



12023 
12024 



GTCAAGACCTGGTAA.GGTTCTTCGCGTTGCTTCGAATTAAACCACATGCTCCACCGCTTG 
GTCAAGACCTGGTAAGGTTCTTCGCGTTGCTTCGAATTAAA.CCACATGCTCCACCGCTTG 
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GTCAAGACCTGGTAAGGTTCTTCGCGTTGCTTCGAATTAAACCACATGCTCCaCCGCTTG 
GTCAAGACCTGGTAAGGTTCTTCGCGTTGCTTCGAATTAAACCACATGCTCCACCGCTTG 
GTCAAGACCTGGTAAGGTTCTTCGCXSTTGCTTCGAATTAftACCACATGCTCCACCGCTTG 
GTCAAGACCTGGTAAGGTTCTTCX3CGTTGCTTCGAATTAAACCACATGCTCCACCGCTTG 
GTCAAGACCTGGTAAGGTTCTTCGCX3TTGCTTCGAATTAAACCACATGCTCCACCGCTTG 
************************************************************ 
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TGCGGGCCCCCGTCAATTCCTTTGAGTTTCAACCTTGCGGTCGTACTCCCCAGGCGGAGT 
TGCGGGCCCCCGTCAATTCCTTTGAGTTTCAACCTTGCGGTCGTACTCCCCAGGCGGAGT 
TGCGGGCCCCCGTCAATTCCTTTGAGTTTCAACCTTGCGGTCGTACTCCCCAGGCGGAGT 
TGCGGGCCCCCGTCAATTCCTTTGAGTTTCAACCTTGCGGTCGTACTCCCCAGGCGGAGT 
TGCGGGCCCCCGTCAATTCCTTTGAGTTTCAACCTTGCGGTCGTACTCCCCAGGCGGAGT 
TGCGGGCCCCCGTCAATTCCTTTGAGTTTCAACCTTGCGGTCGTACTCCCCAGGCGGAGT 
TGCGGGCCCCCGTCAATTCCTTTGAGTTTCAACCTTGCGGTCGTACTCCCCAGGCGGAGT 
************************************************************ 
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GCTTAATGCGTTAGCTGCGGCACTAAGCCCCGGAAAGGGCCTAACACCTAGCACTCATCG 
GCTTAATGCGTTAGCTGCGGCACTAAGCCCCGGAAAGGGCCTAACACCTAGCACTCATCG 
GCTTAATGCGTTAGCTGCGGCACTAAGCCCCGGAAAGGGCCTAACACCTAGCACTCATCG 
GCTTAATGCGTTAGCTGCGGCACTAAGCCCCGGAAAGGGCCTAACACCTAGCACTCATCG 
GCTTAATGCGTTAGCTGCGGCACTAAGCCCCGGAAAGGGCCTAACACCTAGCACTCATCG 
GCTTAATGCGTTAGCTGCGGCACTAAGCCCCGGAAAGGGCCTAACACCTAGCACTCATCG 
GCTTAATGCGTTAGCTGCGGCACTAAGCCCCGGAAAGGGCCTAACACCTAGCACTCATCG 
************************************************************ 

TTTACGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGAGCCTCA 
TTTACGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGAGCCTCA 
TTTACGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGAGCCTCA 
TTTACGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGAGCCTCA 
TTTACGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGAGCCTCA 
TTTACGGCGTGGACTACCIAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGAGCCTCIA 
TTTACGGCGTGGACnACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGAGCCTCA 
************************************************************ 
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GCGTCAGTTACAGACCAGAGAGCCGCTTTCGCCACCGGTGTTCCTCCaTATATCTACGCA 
GCGTCAGTTACAGACCAGAGAGCCGCTTTCGCCACCGGTGTTCCTCCATATATCTACGCA 
GCGTCAGTTACAGACCAGAGAGCCGCTTTCGCCACCGGTGTTCCTCCATATATCTACGCA 
GCGTCAGTTACAGACCAGAGAGCCGCTTTCGCCACCGGTGTTCCTCCATATATCTACGCA 
GCGTCAGTTACAGACCAGAGAGCCGCTTTCGCCACCGGTGTTCCTCCATATATCTACGCA 
GCGTCAGTTACAGACCAGAGAGCCGCTTTCGCCACCGGTGTTCCTCCATATATCTACGCA 
GCGTCAGTTACAGACCAGAGAGCCGCTTTCGCCACCGGTGTTCCTCCATATATCTACGCA 
************************************************************ 
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TTTCACCGCTACACATGGAATTCCACTCTCCCCTTCTGCACTCAAGTCCTCCAGTTTCCA 
TTTCACCGCTACACATGGAATTCCACTCTCCCCTTCTGCACTCAAGTCCTCCAGTTTCCA 
TTTCACCGCTACACATGGAATTCCACTCTCCCCTTCTGCACTCAAGTCCTCCAGTTTCCA 
TTTCACCGCTACACATGGAATTCCACTCTCCCCTTCTGCACTCAAGTCCTCCAGTTTCCA 
TTTCACCGCTACACATGGAATTCCACTCTCCCCTTCTGCACTCAAGTCCTCCAGTTTCCA 
TTTCACCGCTACACATGGAATTCCACTCTCCCCTTCTGCACTCAAGTCCTCCAGTTTCCA 
TTTCACCGCTACACATGGAATTCCACTCTCCCCTTCTGCACTCAAGTCCTCCAGTTTCCA 
************************************************************ 
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AAGCGTACAATGGTTAAGCCACTGCCTTTAACTTCAGACTTAAAGAACCGCCTGCGCTCG 
AAGCGTACAATGGTTAAGCCACTGCCTTTAACTTCAGACTTAAAGAACCGCCTGCGCTCG 
AAGCGTACAATGGTTAAGCCACTGCCTTTAACTTCAGACTTAAAGAACCGCCTGCGCTCG 
AAGCGTACAATGGTTAAGCCACTGCCTTTAACTTCAGACTTAAAGAACCGCCTGCGCTCG 
AAGCGTACAATGGTTAAGCCACTGCCTTTAACTTCAGACTTAAAGAACCGCCTGCGCTCG 
AAGCGTACAATGGTTAAGCCACTGCCTTTAACTTCAGACTTAAAGAACCGCCTGCGCTCG 
AAGCGTACAATGGTTAAGCCACTGCCTTTAACTTCAGACTTAAAGAACCGCCTGCGCTCG 
************************************************************ 
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CTTTACGCCCAATAAATCCGGACAACGCTCGGGACCTACGTATTACCGCGGCTGCTGGCA 
CTTTACGCCCAATAAATCCGGACAACGCTCGGGACCTACGTATTACCGCGGCTGCTGGCA 
CTTTACGCCCAATAAATCCGGACAACGCTCGGGACCTACGTATTACCGCGGCTGCTGGCA 
CTTTACGCCCAATAAATCCGGACAACGCTCGGGACCTACGTATTACCGCGGCTGCTGGCA 
CTTTACGCCCAATAAATCCGGACAACGCTCGGGACCTACGTATTACCGCGGCTGCTGGCA 
CTTTACGCCCAATAAATCCGGACAACGCTCGGGACCTACGTATTACCGCGGCTGCTGGCA 
CTTTACGCCCAATAAATCCGGACAACGCTCGGGACCTACGTATTACCGCGGCTGCTGGCA 
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******************************************************* 
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CGTAGTTAGCCGTCCCTTTCTGGTTAGTTACCGTCACTTGGTAGATTTTCCACTCCTACC 
CGTAGTTAGCCGTCCCTTTCTGGTTAGTTACCGTCACTTGGTAGATTTTCCACTCCTACC 
CGTAGTTAGCCGTCCCTTTCTGGTTAGTTACCGTCACTTGGTAGATTTTCCACTCCTACC 
CGTAGTTAGCCGTCCCTTTCTGGTTAGTTACCGTCACTTGGTAGATTTTCCACTCCTACC 
CGTAGTTAGCCGTCCCTTTCTGGTTAGTTACCGTCACTTGGTAGATTTTCCACTCCTACC 
CGTAGTTAGCCGTCCCTTTCTGGTTAGTTACCGTCACTTGGTAGATTTTCCACTCCTACC 
CGTAGTTAGCCGTCCCTTTCTGGTTAGTTACCGTCACTTGGTAGATTTTCCACTCCTACC 
************************************************************ 
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AACGTTCTTCTCTAACAACAGAGCTTTACGATCCGAAAACCTTCTTCACTCACGCGGCGT 
AACGTTCTTCTCTAACAACAGAGCTTTACGATCCGAAAACCTTCTTCACTCACGCGGCGT 
AACGTTCTTCTCTAACAACAGAGCTTTACGATCCGAAAACCTTCTTCACTCACGCGGCGT 
AACGTTCTTCTCTAACAACAGAGCTTTACGATCCGAAAACCTTCTTCACTCACGCGGCGT 
AACGTTCTTCTCTAACAACAGAGCTTTACGATCCGAAAACCTTCTTCACTCACGCGGCGT 
AACGTTCTTCTCTAACAACAGAGCTTTACGATCCGAAAACCTTCTTCACTCACGCGGCGT 
AACGTTCTTCTCTAACAACAGAGCTTTACGATCCGAAAACCTTCTTCACTCACGCGGCGT 
************************************************************ 

TGCTCGGTCAGACTTCCGTCCATTGCCGAAGATTCCCTACTGCTGCCTCCCGTAGGAGTC 
TGCTCGGTCAGACTTCCGTCCATTGCCGAAGATTCCCTACTGCTGCCTCCCGTAGGAGTC 
TGCTCGGTCAGACTTCC3TCCATTGCCGAAGATTCCCTACTGCTGCCTCCCGTAGGAGTC 
TGCTCGGTCAGACTTCCGTCCATTGCCGAAGATTCCCTACTGCTGCCTCCCGTAGGAGTC 
TGCTCGGTCAGACTTCCGTCCATTGCCGAAGATTCCCTACTGCTGCCTCCCGTAGGAGTC 
TGCTCGGTCAGACTTCCGTCCATTGCCGAAGATTCCCTACTGCTGCCTCCCGTAGGAGTC 
TGCTCGGTCAGACTTCCGTCCATTGCCGAAGATTCCCTACTGCTGCCTCCCGTAGGAGTC 
************************************************************ 
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TGGGCCGTGTCTCAGTCCCAGTGTGGCCGATCACCCTCTCAGGTCGGCTATGTATCGTCG 
TGGGCCGTGTCTCAGTCCCAGTGTGGCCGATCACCCTCTCAGGTCGGCTATGTATCGTCG 
TGGGCCGTGTCTCAGTCCCAGTGTGGCCGATCIACCCTCTCIAGGTCGGCTATGTATCGTCG 
TGGGCCGTGTCTCAGTCCCAGTGTGGCCGATCACCCTCTCAGGTCGGCTATGTATCGTCG 
TGGGCCGTGTCTCAGTCCCAGTGTGGCCGATCACCCTCTCAGGTCGGCTATGTATCGTCG 
TGGGCCGTGTCTCAGTCCCAGTGTGGCCGATCACCCTCTCAGGTCGGCTATGTATCGTCG 
TGGGCCGTGTCTCAGTCCCAGTGTGGCCGATCACCCTCTCAGGTCGGCTATGTATCGTCG 
************************************************************ 
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CCTTGGTGAGCCTTTACCTCACCAACTAGCTAATACAACGCAGGTCCATCTCACAGTGAA 
CCTTGGTGAGCCTTTACCTCACCAACTAGCTAATACAACGCAGGTCCATCTCACAGTGAA 
CCTTGGTGAGCCTTTACCTCACCAACTAGCTAATACAACGCAGGTCCATCTCACAGTGAA 
CCTTGGTGAGCCTTTACCTCACCAACTAGCTAATACAACGCAGGTCCATCTCACAGTGAA 
CCTTGGTGAGCCTTTACCTCACCAACTAGCTAATACAACGCAGGTCCATCTCACAGTGAA 
CCTTGGTGAGCCTTTACCTCACCAACTAGCTAATACAACGCAGGTCCATCTCACAGTGAA 
CCTTGGTGAGCCTTTACCTCACCAACTAGCTAATACAACGCAGGTCCATCTCACAGTGAA 
************************************************************ 
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GCAATTGCTCCTTTTAAATAACTAACATGTGTTAATTACTCTTATGCGGTATTAGCTATC 
GCAATTGCTCCTTTTAAATAACTAACATGTGTTAATTACTCTTATGCGGTATTAGCTATC 
GCAATTGCTCCTTTTAAATAACTAACATGTGTTAATTACTCTTATGCGGTATTAGCTATC 
GCAATTGCTCCTTTTAAATAACTAACATGTGTTAATTACTCTTATGCGGTATTAGCTATC 
GCAATTGCTCCTTTTAAATAACTAACATGTGTTAATTACTCTTATGCGGTATTAGCTATC 
GCAATTGCTCCTTTTAARTAACTAACATGTGTTAATTACTCTTATGCGGTATTAGCTATC 
GCAATTGCTCCTTTTAAATAACTAACATGTGTTAATTACTCTTATGCGGTATTAGCTATC 
************************************************************ 
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12023 
12024 
12019 



GTTTCCAATAGTTATCCCCCGCTATGAGGCAGGTTACCTACGCGTTACTCACCCGTTCGC 
GTTTCCAATAGTTATCCCCCGCTATGAGGCAGGTTACCTACGCGTTACTCACCCGTTCGC 
GTTTCCAATAGTTATCCCCCGCTATGAGGCAGGTTACCTACGCGTTACTCACCCGTTCGC 
GTTTCCAATAGTTATCCCCCGCTATGAGGCAGGTTACCTACGCGTTACTCACCCGTTCGC 
GTTTCCAATAGTTATCCCCCGCTATGAGGCAGGTTACCTACGCGTTACTCACCCGTTCGC 
GTTTCCAATAGTTATCCCCCGCTATGAGGCAGGTTACCTACGCGTTACTCACCCGTTCGC 
GTTTCCAATAGTTATCCCCCGCTATGAGGC3M3GTTACCTACGCGTTACTCACCCGTTCGC 
************************************************************ 

AACTCATCAGTCTAGTGTAAACACCAAACCTCAGCGTTCTACTTGCATGTATTAGGCACG 
AACTCATCAGTCTAGTGTAAACACCAAACCTCAGCGTTCTACTTGCATGTATTAGGCACG 
AACTCATCAGTCTAGTGTAAACACCAAACCTCAGCGTTCTACTTGCATGTATTAGGCACG 
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ARCTCATCAGTCTAGTGTAAACACCAAACCTCAGCGTTCTACTTGCATGTATTAGGCACG 
AACTCATCAGTCTAGTGTAAACACCAAACCTCAGCGTTCTACTTGCATGTATTAGGCACG 
AACTCATCAGTCTAGTGTAAACACCAAACCTCAGCGTTCTACTTGCATGTATTAGGCACG 
AACTCATCAGTCTAGTGTAAACACCAAACCTCAGCGTTCTACTTGCATGTATTAGGCACG 
************************************************************ 
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CCGCCAGCGTTCGTCCTGAGCCAGGATCAAACTCTCATTAAAAGTTTGAGCTTTGCTCTT 
CCGCCAGCGTTCGTCCTGAGCCAGGATCAAACTCTCATTAAAAGTTTGAGCTTTGCTCTT 
CCGCCAGCGTTCGTCCTGAGCCAGGATCAAACTCTCATTAAAAGTTTGAGCTTTGCTCTT 
CCGCCAGCGTTCGTCCTGAGCCAGGATCAAACTCTCATTAAAAGTTTGAGCTTTGCTCTT 
CCGCCAGCGTTCGTCCTGAGCCAGGATCAAACTCTCATTAAAAGTTTGAGCTTTGCTCTT 
CCGCCAGCGTTCGTCCTGAGCCAGGATCAAACTCTCATTAAAAGTTTGAGCTTTGCTCTT 
CCGCCAGCGTTCGTCCTGAGCCAGGATCAAACTCTCATTAAAAGTTTGAGCTTTGCTCTT 
************************************************************ 

TTCTGTCTCGCTGACAGATTTATTGTTTTTT-GTCATTGACGGATTTACAATGTAAATCC 

-GTCATTGACGGATTTACAATGTAAATCC 
TTCTGTCTCGCTGACAGATTTATTGTTTTTTTGTCATTGACGGATTTACAATGTAAATCC 



TTCTGTCTCGCTGACAGATTTATTGTTTTTT-GTCATTGACGGATTTACAATGTAAATCC 
TTCTGTCTCGCTGACAGATTTATTGTTTTTT-GTCATTGACGGATTTACAATGTAAATCC 
******************************* **************************** 
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ACCCTGCACATTCGTTCGTCTTGTTCAGTTTTCAAAGGTCTAATGATATATCATAAAAAT 
ACCCTGCACATTCGTTCGTCTTGTTCAGTTTTCAAAGGTCTTTGCCTCTCTTGAGACAAC 
ACCCTGCACATTCGTTCGTCTTGTTCAGTTTTCAAAGGTCTTTGCCTCTCTTGAGACAAC 
ACCCTGCACATTCGTTCGTCTTGTTCAGTTTTCAAAGGTCTTTGCCTCTCTTGAGACAAC 
ACCCTGCACATTCGTTCGTCTTGTTCAGTTTTCAAAGGTCTTTGCCTCTCTTGAGACAAC 
ACCCTGCACATTCGTTCATCTTGTTCAGTTTTCAAAGGTCTTTGCCTCTCTTGAGACAAC 
ACCCTGCACATTCGTTCGTCTTGTTCAGTTTTCAAAGGTCTTTGCCTCTCTTGAGACAAC 
***************** *********************** * * * * ** 
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ATATCCATCGGGAAGACAGGATTCGAACCTG-CGACACCTTGGTCCCAAACCAAGTACTC 



TTCTATATTCTAGCAAACTTATTCTGCTTTGTCAACTACTTT-TTTTAAGTTGTTAACTA 
TTCTATATTCTAGCAAACTTATTCTGCTTTGTCAACTACTTT - TTTTAAGTTGTTAACTA 
TTCTATATTCTAGCAAACTTATTCTGCTTTGTCAACTACTTT-TTTTAAGTTGTTAACTA 
TTCTATATTCTAGCAAACTTATTCTGCTTTGTCAACTACTTT-TTTTAAGTTGTTTATAA 
TTCTATATTCTAGCAAACTTATTCTGCTTTGTCAACTACTTT-TTTTAAGTTGTTAACTA 
* * ** * **** ** * ** *** * ** * 
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TACCAAGCTG- -A-GCTACT-TCCCGAAAAA- - -TATGCACC- - -CTAGAGGAGTCGAAC 
CGCGTTACTAGAA-GCTGCTCTCTCGAGACAACTTATTCATTATACTAAATATTTCTACT 
CGCGCTAATAGAA-ACTGCTCTCTCGAGACAACTTATTCATTATACTAAATATTTCTACT 
CGCGCTAATAGAA-ACTGCTCTCTCGAGACAACTTATTTAGTTTACTACATCATCTCTTA 
CGCGCTAATAGAA-ACTGCTCTCTCGAGACAACTTATTTAGTTTACTACATCATCTCTTA 
AATGATAATACAATATTAGGTTCGCTTAAGAACTCATTTAGTATACTATAATTTTTTATT 
CGCGCTAATAGAA-ACTGCTCTCTCGAGACAACTTATTTAGTTTACTACATCATCTCTTA 
* * * *** ** *** *** * 
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CTCTAACCGCCTGATTCGTA-GTCAGGTACTCTATCC AGTTGA GCTAAG 

TCCTGTCAATACTATTTTTGCATTTTTTCTTTTATTTTTAAA-AAGTTAATATTATTTAT 

TCCTGTCAATACTATTTTTGTA TTTTATAAATTTAGTAT - AGACATAACTATTCCTC 

CTTTGTCAACTCTTTTTTCATACT-TTTTCTACATTTTCTGA-AAATGTAGATCAGGCTC 
CTTTGTCAACTCTTTTTTCATACT-TTTTCTACATTTTCTGA-AAATGTAGATAGAGCGC 

TGTTGTCAATAGGTTTTAAAAA AATCTCAGAGAAAACCCTGAGATTTTT 

CTTTGTCAACTCTTTTTTCATACT-TTTTCTACATTTTCTGAAAAAAGTTTCCTGTTGGC 
* * ** * 

GGTGCTAAAT ATTATATGCCGA GGACCGGAATC G A 

AGTAACTAAC CTTCTATACTTGTTGA-ATGGATAGCATTT T T 

TATATTCAATTAAGAGAAATTATATAACCACTATTGAGAAATGTAGTC T- - -A 

AA-GCTTAAC- - -GATTCTTTTTAAAATCATTA AATTTTAAAA C---A 

AAGAAAAAAAGAGGTCTCACCTCTTTTTATTTCTTAGTAACTACTACA A- - -A 

TAAATT- -ATGTTACAAAGTT- -AATTTCCTT TAGCTTCAATT AAA 

TAACACCAATAACATAGAGTTTAAAATTCCATAC--CTAAATTTATTTTATTAGTAAAAA 
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ACCGGTACGATGTTTACC-A--TCGCAGGATTTTAAGTCCTGTGCGTCTGC--CAGTTCC 
ACCGTTGTCATGTTCAT- -A- -TTTCATCTTCTTAATTCACAAATTTAAACTTCATCTTC 
GCGATTAAATTCTTTGCTCA- - TQ3AA-AATATCCAATAAATATAATAATGCATAAAACG 

AATTTCAGACATGTTGC CAAA-GTTTTGATATTATTACTATAAT- -ATAGTTTG 

TCTATTAGGATCGTTACCTT- - CAGAATAACTTTCAACACCCTCTATAGT- TGCAATTGT 
CCTAGTTCGCCATCTTCACG-CTTGTAAAGGACATTTGTCGTATTATCTTCTGCATCT-- 
AATAAAAGATGGGCTAGCCATCTTTTATAATATTTGTTTTTTATATTCTTCAGCTTCTTG 



G-CCA CCCCGGCCTCTAACAAGCGAACGACGGGGTTCGAA- CCCGCGACCCT 

A-TAAAAAATACCCTTCAAATTTTATCTAAATTTGAAGGGTATTTGAAATTTATAAAGTT 
C-CTGCTTACGAAATATAAACAAR-ATTGTTTGCAT--TTCGTAAACAAGCGTTACCTAT 

T-AGAGGAGAATARTATGGGCCAA-GAACCTATCAT--CGAATATCAAAATAT CAA 

T - TTATGAACAGTTTTTCGCTCACTGTTACTCATAGGATCCATATGGTAAGGTTCATTAG 
--GTATAGATAAAGAAATCATGACCTAAaAGTTCCATTTGCAACAATGCTTCCTCAACAT 
GGGTGTAGATAAAACAAA-ATGACCAGGGGTAATCTCGTGCATTTGACGTTCTTGTCCGT 
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CAC- - CTTGGCAAGGTGATGTTCTACCACTGAACTACGT - TCGCACTAAAGACACTATT - 
CTT - - TAAAAATATATGATGACTTATTTTTTATCTTCTTCTTGCATTTTTTCTTTGATTT 
TTA--ACAATATATGATGAGTGTTCCCGCTGAGAATAATTCTCAGCGGTAGACCAGAGCT 

TAA--AGTGTATGGGGAAAATGTTGCGGTTGAAGACA TTAACCTTAAAATTTACCC 

TCT- -CTAAAACACGCCTAGCTATTTTTTTAGAAAAA TCAATTAAAGTTTCTGTAC 

CCA- -TTGGTTTTAGATTAACATTCTTAGTACGTACAAT- -T-CTTTGGCTTACTGCTTC 
CTTGCTCAATAGCTGGATTATACGGCTGGTGAACACGTT--GACGTTCACTCTCCGGATC 
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CATCGTATGATAACGCTCTTGCTTTATCTTCA TCATTTTCTGTCTCAGGCATTTTAC 

AGACTARGAATCGATTGATTCCATCATC^^ 

TGGT-- -GATTTTGTTTGTTTCATCGGTA CGAGTGGATCAGGTAAAACAACAT 

GATGCTCAACGTAGTCATGGACATTAATGGA TACTGAAAAACTCTTAGAAAAGCGG 

TTCATCTGGCTCAGCC TCAAATTCTGTTGTGAAAAC TTGACTTGCTGGAATC 

TGGTTCTGG^TMCTGATAATAGACTCTTCX3TATABGGGTGGA 



40 



45 



50 



12023 
12024 
12019 
12021 
12020 
12018 
12022 

12023 
12024 
12019 
12021 
12020 
12018 
12022 



CTGTCTCAAAAATCGATTTAATCTGAGCAGCATCAA GAGTCTC&TATTTTAAGAG 

TAATTCA-ATAATTGCCATTGGGGCAGCATCGCCAC--GGCGTGGTTCTGT-TTTAAGAA 
TAATGCGTATGGTTAACCATATGTTAAAACCAACAA- -ATGGTACTCTATTATTTAAGGG 

TCATGAAGATAATT TTGTGCTAACAACTGCAACGATTTTAATACTTT-TCCATGAT 

TTTTCACGATATT TTTTCGCAATTTTA GTTTTATT - TTTACGA- 

ATCATCAGATGTTC- - - CAACTTCTAACAGTTTCCCCCAATGCATAACACC-GATACGAT 



GGCTTCTGCAATTAATTTATGAGTATCACGGTTTTCGTTGATAATATCAGCTGCC--TTA 
TACGAGTGTATCCTCCG- -TTACGTTCAGCATAACGAGGTGCGATGTCGTC-AAA- -AAG 

AAAAGATATTTCTACTA- -TTAACCCCATTGAATTAAGACGCAGAATTG GAT--ATG 

AACCAATGATGCGCCCAGCTTCTGGCGTTTCTATTTGGAGATTTATTTGTC-GCT--TAC 

ATTTGAC GCTCAATTTT ATCAACAACTAAGTCAAT 

CTGAAATGTATTTTACC--ATAGACAAATCATGTGCGATAAACAAATAAGTCAATCCTTG 



55 



60 



65 



12023 
12024 
12019 
12021 
12020 
12018 
12022 

12023 
12024 
12019 
12021 
12020 
12018 
12022 



TTACGTGCTTCATTAAGAAGGTGACGAACTTCATCATCAATAAGTTGTGCAGTTTGAGCA 
TTTTTGAAGAGCTGTTGTTGATGTAT-ATTT-ATCAGAAGCTTCATCATAGTTTTCTGAT 
TTATCCAAAACATTGGTTTAATGCCTCATATGACCATTTACGAAAATATAGTTCT-TGTA 

TTGTCGTAG - - TTTCTATT3TTGCAT- CTAAATCCATCTCATAGATGATATTTTC A 

TGACCCATACATATCTTGTGAAACATCTTCTGCTCGTAAAGTAATAGAATCTATTAAG - - 
TTCTCTTTGCAATTTTTGCATTAAATTAACAACTTGTGCTTGGATTGAAACATCTAAGGC 



GAATATGATTTTTCAGGTGACATTTGACCA--GCCATCATTGCGTGGTTGCCTTCGTATT 
GCAATTTCATTACGTACATAAGCAGCAGCT--TGACGACGAGCATGTAAATCACCACGTT 
CCAAAATTATTGAAATGGTCAGAAGAAGCT - - AAAAGA- GCTAAAGCAAGGGAACTTATT 
ACGTATTTAGTCACCTGAGCAGCTGCTACT--TCAATATTAGGAAGTAGGTCAATTTTTT 

ATTGTTACTTCAACTTTTGCGGTCTT CTCTCTGTATACTTTGAGGTTGACTCT 

AGATATTGGTTCATCAGCAATGATAAATTTAGGCTCTACTGCTAAAGCACGTGCAATCCC 
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12023 

12 024 GAACTGGTCCAAGTTTCTCGCTCATACCATATTCAGTTACCATAGCGCGGGCC!ATAGCAG 

12019 TACCTAGAGTAATCATTTTTTCAACTGTTTTACGGATTTCTTTAGCACGTGCTTCAGTAG 

12 02 1 AAATTAGTTGAATTACCCGAAGAA-TATTTGGATCGCTACCCTAGTGAGTTGTCTGGCGG 

5 12020 CAATAGCTTCACTTGTAGTTACAACGTTTTTATCATCAA.TTTTTGGAAGT--TCTGGTTG 

12018 AGTATCTAATTCTTGTGCTTC ATTAAAGTATTTTTCAft.CTTTAGAGAGTTTGGTC 

12022 GATACGTTGTCGTTGTCCACCTGAARATTCATGCGGATAA.CGTGTTAAATGATCTTTATT 

12023 

10 12024 TGGCTTGTTCGAAGTCATTTGAGGCACCTGTTGTCTGAGCGTTGAAAATAATTTCTTCCG 

12019 TTACAATTGATTCGTTGATAAGAAGATCGGTTGTCAAATCA- -CGAAGCATTGCCTTACG 

12021 TCAGCAACAACGTATCGGTGTCATTCGCGCTCTTGCAGCAGACCAAGATATTATTTTAaT 

12020 T GATACGTCTTCTTTTTCAAGCGT-TTCATCAACCTCCTCTATATATTCTTCCACC 

12018 TCAACATACTCA- - CGAATAGCTTCTG TTACTTCGATGTTTTCACCACGAAT-ACT 

15 12022 TAACCCTACAAGATCTAATAGGGCCTGAACTTTACTATCACGATCTGATTTTGATTTAGC 

12023 

12024 CTACACGTCCTCCCATAAGACCTGCTAATTGCTCTTTCATATCATCTTTTGARAGAAGCA 

12019 TTGTGAGCTAGT GCGTCCTAGTTTACGGTARGCCATTATGTCCTCCTATTTTA 

20 12021 GGATGAGCCTTTT GGAGCTCTGGATCCTATTACTAGAGAAGGTATTCAAGACTTA 

12 02 0 ACATCTACGCTA GACGGTACATTCTTAATATTTTTTAACG- - CTACCGATTCA 

12018 GTATTTAATCAT ATGAGTACCTCTTTCTTGCGTTGTTAACGCTTTCTATACTCTTA 

12022 TAATTTATGTAT ATCTAAACCT-TCTGCTACGATATCACGAATCTTCATACGGCCG 

25 12023 

12 024 TTTGATCTTCTTTAGGT AAAGCAATCATATATCCACCTGCACGACCACGTGGTACG 

12019 TTTATCGTTTTTTAATC CAAGACCTAGATCGGCAAGTTTGATTTTAACTTCTTCAA 

12021 GTCAAGTCTCTTCAGG AAGAAATGGG- - GAAAACTATCATCTTAGTTACT - CAT 

12 02 0 TTAATATCAGTTACTT CGTCGGT-GATACCTTCTATTTCAACTTTTGCTG 

30 12018 TTATAACC-GCTT TCATGAAAA - 

12022 TTTAAGCTAGCCTGAGGATCCTGARA&ATCATC 

12023 

12024 ATAGTAACTTTATGAACAACTCGCGCATTTGAAAGAATCaAACCGaCAATTGTGTGCCCA 
35 12019 GACTCrTAraTCCTAAGTTTOSGACTTTCATCATTTCAGGCTCAGTTTTTTCTG-TTAAA 

12021 GA- - - T - ATGGATGAAGCCCTCAAGTT- -AGCAACAAAAATT- - ATTGTTATGG-ACAAT 

12020 GC TTTTTACCAAAGCCCAAAAAACCTTTTTTCTCACGTGATACAACTTTTATATGTG 

12018 

12022 GCTTTACCTTTCAGATGTGAGATCACTTCTCCATTAAAGGTAATTTCTCCATCAGAAATA 

40 

12023 

12024 GCTTCATGGTAAGCAACCATAGCTCTTTCTCTTTCAGAAATAGTACGATCTTTTTTAGAA 

12019 TCAAATACTGTATTAATTCCAGCACGTTTTAAACAGTTATATGAGCGCACTGACAAATCA 
12 021 GGTAAAATGGTCCAAGAAGGGACACCCAATGATCTCTTACATCATCCTGCTA 

45 12020 CCCTCAATCGTGAAATGTTTAACTCTTGTAGTCCTTTTTCAATAGCTTCTTCTACAGTCG 

12018 

12 022 TCATAAAGTTTTAAAATTGAACGTCCAACGGTTGTCTTTCCTGATCCAGATTCCCCAACT 

12023 

50 12024 GGACCAGCAATTACACGGTCTTCTGCTTCATCAATATCTGAAGCATCAATAACTTTTTTA 

12019 AGTTCCTCAATTGTCCGGTCAAGCACTTTCTCATCGTTCACTTTCTCTGTTTCCTTCATT 

12021 

12020 CTCCTGTAAATAATACC 

12018 

55 12022 AATCCAAACACTTCACCTTCATAAATGTCAAAACTAACATTATCAATTGCTCTCACTTCA 

12023 

12 024 TTTCGTCGCGCAGCAACTAAAGaGCTTCATTGAGAACATTCTCCAAATCAGCACCAACA 

12019 ACTTCAGTTGCTTTAGCAACCTCTGTTAAATCAGTAAACAAGTTTAAGTGTTCAATTAAG 
60 12021 

12020 

12018 

12 022 TTAGCTTTTCCTTTATTGAAGGTCAAAGAAACATTTTTGACTTCAACTAATTTTTTTCGA 

65 12023 

12024 AATCCTGGGGTTTGTTGAGCTACTACTTTTAAGTCAACATTATCTGCTAATGGTTTATTT. . . 

12019 ACGCGAGCTGAAAGACCAAGAGCATCCTCAGGAATGA 

12021 
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12020 

12018 

12022 TTTTCAGTCATTAGGCT 



5 



It will be understood that the invention has been described by way of example only and modifications may 
be made whilst remaining within the scope and spirit of the invention. 
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TABLE I - THEROETICAL MOLECULAR WEIGHTS FOR GBS PROTEINS 



GBS# 


expected mol. weight (dalton) 


GST-fusion 


His-fusion 


Native 


1 


78425 


53460 


49720 


2 


40035 


15070 


11330 


3 


90305 


65340 


61600 


4 


43115 


18150 


14410 


5 


158835 


133870 


130130 


6 


39265 


14300 


10560 


7 


44985 


20020 


16280 


8 


56315 


31350 


27610 


9 


50265 


25300 


21560 


10 


96465 


71500 


67760 


11 


91515 


66550 


62810 


11d 


85905 


60940 


57200 


12 


64455 


39490 


35750 


13 


40475 


15510 


11770 


14 


33325 


8360 


4620 


15 


44765 


19800 


16060 


16 


73475 


48510 


44770 


17 


46745 


21780 


18040 


18 


54335 


29370 


25630 


19 


46085 


21120 


17380 


20 


47625 


22660 


18920 


21 


56535 


31570 


27830 


21 long 


66435 


41470 


37730 


22 


60055 


35090 


31350 


23 


60165 


35200 


31460 


24 


58405 


33440 


29700 


25 


50265 


25300 


21560 


26 


118245 


93280 


89540 


28 


63795 


38830 


35090 


29 


50595 


25630 


21890 


30 


44215 


19250 


15510 


31 


63795 


38830 


35090 


31d 


58735 


33770 


30030 


32 


40585 


15620 


11880 


33 


71495 


46530 


42790 


34 


69295 


44330 


40590 


35 


56535 


31570 


27830 


36 


59065 


34100 


30360 


37 


46965 


22000 


18260 


38 


61815 


36850 


33110 


39 


65225 


40260 


36520 


41 


75235 


50270 


46530 


42 


46745 


21780 


18040 


43 


58955 


33990 


30250 


A A 
44 


52355 


27390 


23650 


45 


43555 


18590 


14850 


46 


59835 


34870 


31130 


47 


84255 


59290 


55550 


48 


86455 


61490 


57750 


48d 


106695 


81730 


77990 


49 


59615 


34650 


30910 


50 


94155 


69190 


65450 
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51 


47075 


22110 


18370 


52 


55435 


30470 


26730 


53 


110215 


85250 


81510 


54 


73365 


48400 


44660 j 


55 


36295 


11330 


7590 


56 


34865 


9900 


6160 


57 


51145 


26180 


22440 


58 


128805 


103840 


100100 


59 


99215 


74250 


70510 


60 


63575 


38610 


34870 


61 


68085 


43120 


39380 


62 


105485 


80520 


76780 


63 


64125 


39160 


35420 


64 


112745 


87780 


84040 


65 j 


72485 


47520 


43780 


66 


49715 


24750 


21010 


67 


120335 


95370 


91630 


68 


131225 


106260 


102520 


68d 


103065 


78100 


74360 


69 


53895 


28930 


25190 


70 


74465 


49500 


45760 


70d 


59725 


34760 


31020 


71 


56755 


31790 


28050 


72 


75565 


50600 


46860 j 


73 


72815 


47850 


44110 


74 


131225 


106260 


102520 


74d 


95475 


70510 


66770 


75 


114725 


89760 


86020 


76 


198875 


173910 


170170 


77 


78535 


53570 


49830 


78 


48835 


23870 


20130 


79 


58185 


33220 


29480 


79d 


50815 


25850 


22110 


80 


81835 


56870 


53130 


81 


89205 


64240 


60500 


82 


40475 


15510 


11770 


83 


62585 


37620 


33880 


84 


122645 


97680 


93940 


85 


70175 


45210 


41470 


86 


84035 


59070 


55330 


87 


44435 


19470 


15730 


88 


73365 


48400 


44660 


89 


143325 


118360 


114620 


90 


93495 


68530 


64790 


91 


88325 


63360 


59620 


92 


193595 


168630 


164890 


93 


95585 


70620 


66880 


94 


77435 


52470 


48730 


95 


60605 


35640 


31900 


96 


57195 


32230 


28490 


97 


138375 


113410 


109670 


98 


82055 


57090 


53350 


99 


60715 


35750 


32010 


100 


53015 


28050 


24310 


101 


59395 


34430 


30690 


102 


40695 


15730 


11990 


103 


56975 


32010 


28270 
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104 


120005 


95040 


91300 


105 


179735 


154770 


151030 


105dNterm 


127265 


102300 


98560 


105dCterm 


81285 


56320 


52580 


106 


85795 


60830 


57090 


107 


89535 


64570 


60830 


108 


64565 


39600 


35860 


109 


75125 


50160 


46420 


109d 


70725 


45760 


42020 


110 


53895 


28930 


25190 


111/190 


60165 


35200 


31460 


112 


63905 


38940 


35200 


113 


59175 


34210 


30470 


114 


51915 


26950 


23210 


115 


98225 


73260 


69520 


116 


73475 


48510 


44770 


117 


47515 


22550 


18810 


118 


42235 


17270 


13530 


119 


109225 


84260 


80520 


120 


71385 


46420 


42680 


121 


65115 


40150 


36410 


122 


46855 


21890 


18150 


123 


68305 


43340 


39600 


124 


54115 


29150 


25410 


125 


57305 


32340 


28600 


126 


56865 


31900 


28160 


127 


80845 


55880 


52140 


128 


39925 


14960 


11220 


129 


43775 


18810 


15070 


130 


82275 


57310 


53570 


130d 


63245 


38280 


34540 


131 


89755 


64790 


61050 


132 


49055 


24090 


20350 


133 


54445 


29480 


25740 


134 


42015 


17050 


13310 


135 


65225 


40260 


36520 


136 


54885 


29920 


26180 


137 


63465 


38500 


34760 


138 


40145 


15180 


11440 


139 


38165 


13200 


9460 


140 


43445 


18480 


14740 


141 


49935 


24970 


21230 


142 


79745 


54780 


51040 


143 


33545 


8580 


4840 


144 


49165 


24200 


20460 


145 


63025 


38060 


34320 


146 


107025 


82060 


78320 


147 


156965 


132000 


128260 


148 


41905 


16940 


13200 


149 


62365 


37400 


33660 


150 


54665 


29700 


25960 


151 


50412 


25447 


21707 


151L 


50045 


25080 


21340 


152 


45535 


20570 


16830 


153 


46965 


22000 


18260 


154 


101525 


[ 76560 


72820 


155 


62585 


37620 


33880 
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156 


61265 


36300 


32560 


157 


74025 


49060 


45320 


158 


52025 


27060 


23320 


159 


41025 


16060 


12320 


160 


82825 


57860 


54120 


161 


95365 


70400 


66660 


162 


42015 


17050 


13310 


163 


69405 


44440 


40700 


164 


42345 


17380 


13640 


165 


43555 


18590 


14850 


166 


38055 


13090 


9350 


167 


50375 


25410 


21670 


168 


32555 


7590 


3850 


169 


43445 


18480 


14740 


170 


64015 


39050 


35310 


170d 


59945 


34980 


31240 


171 


49825 


24860 


21120 


172 


62365 


37400 


33660 


173 


96795 


71830 


68090 


174 




20130 


16390 


176 




34210 


30470 


1 I U 




30470 


26730 


177 

1 / 1 


UUZ. 1 <J 


41260 


37510 


178 


62365 


37400 


33660 


17Q 


6R615 


33650 


29810 


180 


37615 


12650 


8910 


181 


63686 


38720 


34980 


189 


90086 


65120 


61380 






62260 


58520 


183 


57855 


32890 


29150 


184 


46415 


21450 


17710 


185 


40695 


15730 


11990 


186 


86685 


60720 


56980 


187 


56205 


31240 


27500 


188 


61595 


36630 


32890 


189 


60165 


35200 


31460 


191 


1 16705 


91740 


88000 


192 


69625 


44660 


40920 


193 


98005 


73040 


69300 


194 


49385 


24420 


20680 


195 


81065 


56100 


52360 


195L 


147615 


122650 


118910 


1 95L N-term 

1 VV/L 11 iw till 


91405 


66440 


62700 


196 


69515 


44550 


40810 


197 


99325 


74360 


70620 


198 


73805 


48840 


45100 


199 


158285 


133320 


129580 


200 


132325 


107360 


103620 


201 


74538 


49573 


45833 


202 


1 57295 


132330 


128590 


203 


61705 


36740 


33000 


204 


39705 


14740 


11000 


205 


55985 


31020 


27280 


206 


56645 


31680 


27940 


207 


44765 


19800 


16060 


208 


59725 


34760 


31020 


209 


62145 


37180 


33440 
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209d 


56425 


31460 


27720 


210 


60935 


35970 


32230 


210d 


53675 


^_ 28710 


24970 


211 


64895 


39930 


36190 


212 


60825 


35860 


32120 


213 


45205 


20240 


16500 


214 


38935 


13970 


10230 


215 


45205 


20240 


16500 


216 


91515 


66550 


62810 


217 


36075 


11110 


7370 


218 


81065 


56100 


52360 


219 


56535 


31570 


27830 


220 


54555 


29590 


25850 


220 


50155 


25190 


21450 


221 


41465 


16500 


12760 


222 


47405 


22440 


18700 


223 


42895 


17930 


14190 


224 


45865 


20900 


17160 


225 


56645 


31680 


27940 


226 


44875 


19910 


16170 


227 


46195 


21230 


17490 


228 


46525 


21560 


17820 


229 


35855 


10890 


7150 


230 


51915 


L 26950 


23210 


231 


60935 


35970 


32230 


231d 


58735 


33770 


30030 


232 


41795 


16830 


13090 


233 


35635 


10670 


6930 


234 


43115 


18150 


14410 


235 


58295 


33330 


29590 


235d 


48395 


23430 


19690 


236 


46525 


21560 


17820 


237 


44215 


19250 


15510 


238 


59725 


34760 


31020 


239 


63905 


38940 


35200 


240 


51475 


26510 


22770 


241 


45095 


20130 


16390 


242 


43225 


18260 


14520 


243 


119455 


94490 


90750 


244 


48065 


23100 


19360 


245 


48615 


23650 


19910 


246 


49605 


24640 


20900 


246d 


45975 


21010 


17270 


247 


58955 


33990 


30250 


248 


92505 


67540 


63800 


248d 


70835 


45870 


42130 


249 


103835 


78870 


75130 


250 


136505 


111540 


107800 


251 


52135 


27170 


23430 


252 


51695 


26730 


22990 


253 


74245 


49280 


45540 


254 


59615 


34650 


30910 


255 


69075 


44110 


40370 


256 


47845 


22880 


19140 


257 j 


60495 


35530 


31790 


258 


67975 


43010 


39270 


259 


79415 


54450 


50710 
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260 


48175 


23210 


19470 


261 


55765 


30800 


27060 


262 


75345 


50380 


46640 


263 


63465 


38500 


34760 


264 


47185 


22220 


18480 


265 


56315 


31350 


27610 


266 


51365 


26400 


22660 


267 


88655 


63690 


59950 


268 


50265 


25300 


21560 


269 


60495 


35530 


31790 


270 


59285 


34320 


30580 


271 


56315 


31350 


27610 


272 


118355 


93390 


89650 


272d 


98885 


73920 


70180 


273 


70945 


45980 


42240 


274 


56205 


31240 


27500 


275 


47515 


22550 


18810 


276 


147945 


122980 


119240 


277 


87005 


62040 


58300 


277d 


75675 


50710 


46970 


278 


52245 


27280 


23540 


279 


79415 


54450 


50710 


280 


88655 


63690 


59950 


281 


74465 


49500 


45760 


281 d 


71495 


46530 


42790 


282 


44765 


19800 


16060 


283 




20240 


16500 


284 


67645 


42680 


38940 


285 


57525 


32560 


28820 


286 


41355 


16390 


12650 


287 


61045 


36080 


32340 


287d 


57085 


32120 


28380 


288 


53675 


28710 


24970 


288d 


51035 


26070 


22330 


289 


65005 


40040 


36300 


289 long 


71825 


46860 


43120 


290 


47405 


22440 


18700 


291 


63795 


38830 


35090 


292 


103505 


78540 


74800 


293 


115935 


90970 


87230 


293d N-term 


73805 


48840 


45100 


293d C-term 


70835 


45870 


42130 


294 


75785 


50820 


47080 


295 


89425 


64460 


60720 


296 


60385 


35420 


31680 


297 


100205 


75240 


71500 


298 


54335 


29370 


25630 


299 


62255 


37290 


33550 


300 


130895 


105930 


102190 


301 


54885 


29920 


26180 


302 


80075 


55110 


51370 


303 


53235 


28270 


24530 


304 


75125 


50160 


46420 


305 


78645 


53680 


49940 


306 


67975 


43010 


39270 


307 


86675 


61710 


57970 


308 


59285 


34320 


30580 
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309 


62695 


37730 


33990 


310 


58845 


33880 


30140 


311 


76445 


51480 


47740 


312 


64785 


39820 


36080 


313 


65995 


41030 


37290 


314 


52135 


27170 


23430 


315 


51695 


26730 


22990 


316 


41795 


16830 


13090 


317 


179295 


154330 


150590 


31 7d N-term 


115935 


90970 


87230 


31 7d C-term 


92160 


67402 


63360 


318 


70065 


45100 


41360 


319 


61925 


36960 


33220 


320 


57965 


33000 


29260 


321 


83705 


58740 


55000 


322 


76628 


51663 


47923 


323 


86345 


61380 


57640 


324 


86345 


61380 


57640 


325 


82605 


57640 


53900 


326 


91515 


66550 


62810 


326L 


172695 


147730 


143990 


326L N-term 


113955 


88990 


85250 


327 


279175 


254210 


250470 


327d N-term 


139915 


114950 


111210 


327d C-term 


167965 


143000 


139260 


328 


97602 


72637 


68897 


329 


113955 


88990 


85250 


330 


83595 


58630 


54890 


331 


60825 


35860 


32120 


332 


75675 


50710 


46970 


333 


63465 


38500 


34760 


333d 


57965 


33000 


29260 


334 


38275 


13310 


9570 


335 


43555 


18590 


14850 


336 


67645 


42680 


38940 


337 


75235 


50270 


46530 


338 


54995 


30030 


26290 


339 


76665 


51700 


47960 


339d 


72925 


47960 


44220 


340 


86565 


61600 


57860 


341 


38385 


13420 


9680 


342 


61595 


36630 


32890 


343 


60385 


35420 


31680 


344 


55875 


30910 


27170 


345 


40585 


15620 


11880 


346 


53895 


28930 


25190 


347 


55325 


30360 


26620 


348 


58405 


33440 


29700 


349 


98335 


73370 


69630 


350 


53895 


28930 


25190 


351 


82165 


57200 


53460 


352 


111315 


86350 


82610 


352d 


105485 


80520 


76780 


353 


55325 


30360 


26620 


354 


42345 


17380 


13640 


355 


52135 


27170 


23430 


356 


59065 


34100 


30360 
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357 


40255 


15290 


11550 


358 


60495 


35530 


31790 


359 


78865 


53900 


50160 


360 


73695 


48730 


44990 


361 


109005 


84040 


80300 


362 


125945 


100980 


97240 


362d N-term 


63355 


38390 


34650 


362d C-term 


91295 


66330 


62590 


363 


53125 


28160 


24420 


364 


75015 


50050 


46310 


365 


102075 


77110 


73370 


366 


68415 


43450 


39710 


367 


76885 


51920 


48180 


368 


44765 


19800 


16060 


369 


142115 


117150 


113410 


370 


94595 


69630 


65890 


371 


65555 


40590 


36850 


372 


55105 


30140 


26400 


373 


50265 


25300 


21560 


374 


57525 


32560 


28820 


375 


66875 


41910 


38170 


376 


48065 


23100 


19360 


377 


73805 


48840 


45100 


378 


58955 


33990 


30250 


379 


68855 


43890 


40150 


380 


47405 


22440 


18700 


381 


66875 


41910 


38170 


382 


50815 


25850 


22110 


383 


57085 


32120 


28380 


384 


77985 


53020 


49280 


385 


75675 


50710 


46970 


386 


39485 


14520 


10780 


387 


54555 


29590 


25850 


388 


45645 


20680 


16940 


389 


43005 


18040 


14300 


390 


62255 


37290 


33550 


391 


54775 


29810 


26070 


392 


71385 


46420 


42680 


393 


55765 


30800 


27060 


394 


59725 


34760 


31020 


395 


72375 


47410 


43670 


396 


34865 


9900 


6160 


397 


113625 


88660 


84920 


397d 


100865 


3740 


72160 


398 


56755 


31790 


28050 


399 


55435 


30470 


26730 


400 


74135 


49170 


45430 


401 


59395 


34430 


30690 


402 


78095 


53130 


49390 


403 


64455 


39490 


35750 


404 


61595 


36630 


32890 


405 


45975 


21010 


17270 


406 


36955 


11990 


8250 


407 


82715 


57750 


54010 


407d 


71715 


46750 


43010 


408 


45315 


20350 


16610 


409 


70395 


45430 


41690 
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409d 


59600 


34842 


30800 


410 


62475 


37510 


33770 


411 


41355 


16390 


12650 


412 


35965 


11000 


7260 


413 


59175 


34210 


30470 


414 


50375 


25410 


21670 


415 


46195 


21230 


17490 


416 


42455 


17490 


13750 


417 


77985 


53020 


49280 


418 


42125 


17160 


13420 


419 


47515 


22550 


18810 


420 


67755 


42790 


39050 


421 


62915 


37950 


34210 


422 


60165 


35200 


31460 


423 


74245 


49280 


45540 


424 


89975 


65010 


61270 


424 


77325 


52360 


48620 


425 


116045 


91080 


87340 


426 


83815 


58850 


55110 


427 


41135 


16170 


12430 


428 


55325 


30360 


26620 


429 


59175 


34210 


30470 


430 


53785 


28820 


25080 


431 


54005 


29040 


25300 


432 


65665 


40700 


36960 


433 


40915 


15950 


12210 


434 


44545 


19580 


15840 


642 


91845 


66880 


63140 


643 


78975 


54010 


50270 


644 


49605 


24640 


20900 


645 


59725 


34760 


31020 


646 


61595 


36630 


32890 


647 


55875 


30910 


27170 


648 


59835 


34870 


31130 


649 


76115 


51150 


47410 


650 


51475 


26510 


22770 


651 


53345 


28380 


24640 


652 


49715 


24750 


21010 


653 


44655 


19690 


15950 


654 


51255 


26290 


22550 


655 


65995 


41030 


37290 


656 


57525 


32560 


28820 


657 


62805 


37840 


34100 


658 


60165 


35200 


31460 


659 


60275 


35310 


31570 


660 


71495 


46530 


42790 


661 


60605 


35640 


31900 


662 


62695 


37730 


33990 


663 


89535 


64570 


60830 


664 


45315 


20350 


16610 


665 


41135 


16170 


12430 


666 


47075 


22110 


18370 


667 


53162 


28197 


24457 


668 


43555 


18590 


14850 


669 


48505 


23540 


19800 


670 


45315 


20350 


16610 


671 


36940 


12182 


8140 
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672 


40130 


15372 


11330 


673 


41450 


16692 


12650 


674 


45300 


20542 


16500 


L 675 


55970 


31212 


27170 


676 


65650 


40892 


36850 


677 


54320 


29562 


25520 


678 


77750 


52992 


48950 


679 


60480 


35722 


31680 


680 


64440 


39682 


35640 


681 


93040 


68282 


64240 


682 


84790 


60032 


55990 


683 


15950 


44655 


19690 


684 


11880 


40585 


15620 


685 


16280 


44985 


20020 


686 


21340 


50045 


25080 


687 


9350 


38055 


13090 


689 


55105 


3740 


26400 



TABLE II - PRIMERS USED TO AMPLIFY GBSnnn PROTEINS 

Forward primers begin 5' -GGGGACAAGTTTGTACAAAAAAGCAGGC-3 ' and continue with the sequences 
indicated in the table below; reverse primers begin 5 ' -GGGGACCACTTTGTACAAGAAAGCTGGGTT-3 1 and 
5 continue with the sequences indicated in the table. The primers for GBS1 are thus: 

Fwd: GGGGACAAGTTTGTACAAAAAAGCAGGCTCTCAATCTCATATTGTTTCAG 

Rev: GGGGACCACTTTGTACAAGAAAGCTGGGTTATTTTTAGACATCATAGACA 

The full forward primer sequences are given in the sequence listing as SEQ IDs 10968-11492. The 



reverse primer sequences are SEQ IDs 11493-12017. 



GBS 


Forward 


Reverse 


1 


TCTCAATCTCATATTGTTTCAG 


ATTTTTAGACATCATAGACA 


2 


TCTAATTACATTATTACATT'I "i'TG 


GGGAATGCCTACAAA 


3 


TCTGATACTAGTTCAGGAATATC 


TTTTTTACTATACTTTTTGT 


4 


TCTGATACAAGTGATAAGAATACT 


TTCCTTTTTAGGCTTACT 


5 


TCTATTTTTCTTCATAGTCCAC 


ATTAGCTTCATTTGTCAG 


6 


TCTGAATGGGTGTTATTAACTC 


AGTTTCTTCTTTAAAATCAT 


7 


TCTACAAATTCTTATTTTAGCAA 


CTCTGAAGCTGTAAAACC 


8 


TCTGTATCAGTTCAGGCGT 


TTTATCAATGTTTGAAACG 


9 


TCTGCTGCTCTAGGACAAC 


TAGTAAATCAAGTTTTTGCA 


10 


TCTTTTGTTG7TGCCTTATT 


ATCCCTTCTATTTTCGA 


11 


TCTCCACCTATGGAACGT 


ATGTAGTGACGTTTCTGTG 


11d 


TCTCAGAAAGTCTATCGGG 


ATGTAGTGACGTTTCTGTG 


12 


TCTAGTGAGAAGAAAGCAAAT 


ATTGGGTGTAAGCATT 


13 


TCTTCTTGGAATTATTGGAG 


CTTAACTCTACCCGTCC 


14 


TCTGCAATGATTGTAACCAT 


TTTTCTCTTATTAAAGAATT 


15 


TCTGCATCTTATACCGTGAA 


ATACCAGCCGTTACTATT 


16 


TCTGCCGAGAAGGATAAA 


TTTAGCTGCTTTTTTAATG 


17 


TCTGTTTATAAAGTTATTCAAAA 


AAATACTACATTTACAGGTG 


18 


TCTAAGCCTAACAGTCAACA 


TTGGTTATTCTCCTTTAAT 


19 


TCTGATGATAACTTTGAAATGC 


ATTATATTTTTGGATATTTC 


20 


TCTGCAGTGATTGCAAGTC 


GGGCTTTTTCTTAAAAA 


21 


TGTGCTGCATCAAAC 


GTTGGCATCCCTTTT 


21 Long+A527 


TGTGCTGCATCAAAC 


CTTTTGATGGGATTGG 
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22 


TGTACTAAACAAAGCCAG 


TTGATTTAACGATTTGA 


23 


TGTCAATTAACCGATAC 


TTTATCTCCTCTAAAATAATG 


24 


TGCTCAAATGATTCAT 


CTTTGATAAGTCAGACCA 


25 


TCTAAAAGTTCACAAGTTACTACT 


GTAACCCCAAGCTGAT 


26 


TCTAGTCATTATTCCATAAAATT 


TGATTTTGCAATATCAA 


28 


TCTAATCATATGCTGATTGAG 


TTTTTGTAATTTAAGTACTAA 


29 


TCAGTTTGGATGTTAAC 


TTCTTTTATATTAAGAGCTT 


30 


TCAACAAATGCAGATG 


ATTCGGATAAATGTAGC 


31 


TGTTTTGTCATTATTGATAG 


TCCATTTTTATCCTCAC 


31d 


TCTCTAACTTGGTTTTTATTAGA 


TCCATTTTTATCCTCAC 


32 


TCTGGTTTAAAAGTGACTGAA 


ATGACCTCTACTTTCCA 


33 


TCTCATCATTTAGGTAAGGAA 


CTTGTAATCACTTGGAC 


34 


TCTGTTAGTAATCGCTACAATC 


ATTAATCATGGTATTGGT 


35 


TCTAATCAAGAAGTTTCAGC 


CCATTGTGGAATATCA 


36 


TCTCGAGTTTTAGCGGATA 


TTTGTAAAGCAGTTCTT 


37 


TCTGTATTATTTTACCAATCACA 


ATCATTCATATGATCTCTAGA 


38 


TTAGGAGTGGTAGTTCAT 


ATTTTGATTGATTCTACTC 


39 


TTTTTATTGTTAGTATTAGC 


TTTTGTTTTTTTCAAATA 


41 


TCTGTTTATCTAGCGGTTAGA 


ATCTTCAACGTCCTCC 


42 


TATAACAGTTTAGTTAGAAGTC 


AAAGTCAAAGGAAACTT 

/WW W 1 \J \J \V/N/f \I XI VW I | 


43 


TTTAAAGGGTTTACATATT 


TTCTTTATCTAATTTATAATAG 


44 


TTTAATACAATTGGTCG 

i i i / \/ \ i nvrui i i vjvj i J 


TTGCAATGTTTTTTCT 

1 1 \~**~rl\I\ 1 V— ' 1 1 1 1 1 1 1 


45 


TCTATGGAAAAAATTAGGATT 


TAAACTTTGGATAATCTGT 


46 


TCTAGAGATGAGCAAGAAATA 


GTTGAAATTTTGATATGA 


47 


TCTCAACAGATAGGTCTTTATAA 


CTCCTTTACTATATAGCTAACT 

*— ' i ^/v-/ iii' \w i > \ i / v i / ivu i fuiv i 


48 


TTTCTCTATAATTACTTCAAT 


TTGTTTGTGAAGTAAAAC 


49 


TCTAATAAGGCATTATTAGAGG 


TGATAATATCTCCATATTTT 


50 


TCTACACATTTAGTTGACTTAAC 


GCATTGGCGCCATA 


51 


TCTAGTAAACAACACATTTATCTA 


TTCTACACGACTTTTATTC 


52 


TCTCAAGAAACTCATCAGTTG 


AAGACCTCCTCGAGAT 


53 


TCTGCAGAAGACATTGTTACA 


TGTTTTTTCTTTCTGTTG 


54 


TATAATTTTTCGACTAATGA 


TGGATTAGTTTGACCTG 


55 


TCTGACACAGTGTCTTATCCT 


TTTATCGTAAGCACTTAGG 


56 


TCTGTGGAGCAAGTGGCCA 


CTCCTTCCAGGCATCG 


57 


TCTCAAGAACTAAGTAACTTTGA 


GTAAAAGTATCTTAAATAGTCA 


58 


TCTACTGAAACGTTTGAAGG 


TGCCATTCCTCCTCT 


59 


TCTGATGAAGCAACAACTAA 


TGTTACCTTTTTATTTTCT 


60 


TCTAATAAAGATAATCAAAAAACT 


TTTTTCATGCGATTGA 


61 


TGTTTCTTTTTTATTCCA 


GAGACGTTTCTTATACCTT 


62 


TATTACTTTGATGGTAGTTT 


TGTACCATATGTTCTCTCT 


63 


TCTGTTCAATCATTAGCAAA 


AAAAGTTGGACTACTTTC 


64 


TTTAAAGGTAATAAGAAGTTG 


TCGTTTTCCACCC 


64d 


TCTAGTCAAGTTGACTCTGTTA 


TCGTTTTCCACCC 


65 


TCTCAAAACCAGGTGACTG 


ATTTGGGTAAATATAGTAAA 


66 


TTAAGATTTTATAACAACGA 


TTTACGACTAACCTCAAC 


67 


TCTAATGTTTTAGGGGAAA 


AATTCCTTTTGGTGG 


68 


TCCCAAAAGACTTTTG 


GGCAGAATACACCTTC 


68d 


TCCCAAAAGACTTTTG 


GGCTGACGTCGACGCA 


69 


TCTAAAGTTTTAGCCTTTGA 


AACTCTCTTAATATATTCTTCT 


70 


TCTGAAATGGCTTTAG 


GTCTTTTTCAATATTCTGT 


70d 


TCTACTAACTTATTGAGTAGAATCA 


GTCTTTTTCAATATTCTGT 


71 


TGTAGCTCAAAATCTCAT 


CTTCTCCTTAGGAGTAACG 


72 


TCTAGTTTATCTATTAAAGATGCC 


ATTATTATCAATTAATAACTCTT 


73 


TCTATCAAAGAGGCGGTAA 


GTCAAACATACTTCCAAA 


74 


TCTAAAGAGGATAAAAAGCTAG 


TTTCGTCGTATAAGCA 


74d 


TCTAGTGTTTCAGGTAGTAGTG 


TTTCGTCGTATAAGCA 


75 


TCTAAAAAATTAAAACACTCAA 


TGTCCTCATTTTTTCAG 
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I 76 


TCTGATGAAGTTACAACTTCAG 


AATACTTGCTGGAACAG 


77 


TTATTCCAAAGTAAAATAAA 


GTCTTTCTTCAATTTTGG 


78 


TCTCATAACCATCACTCAGAACACATGT 


GTCGTGATTTTTATGAGT 


79 


TCTCCCAAGAATAGGATAAA 


CCCAAACTGGCATAAC 


79d 


TCTAGTCAGTATGAGTCACAGA 


CCCAAACTGGCATAAC 


80 


TCTGCAGAAGTGTCACAAGA 


TGAAGGACGTTTGTTG 


81 


TCTTTTGATGGATTTTT 


TTTTTTTAGTTTAAGGCTA 


82 


TCTACAAATGAAAAACGAAC 


GTCCACCTTCCGAT 


83 


TCTGAAATTAAACTCAAAAATATT 


AACATTGTTTTTCCTTTC 


84 


TCTCATACTCAAGAACACAAAA 


ATGGTGATGATGACCT 


85 


TCTCCTAAGAAGAAATCAGATAC 


ATTAACATTTTGAGGGT 


86 


TCTGCAGAACTAACTCTTTTM 


TTTTGCAAAATCAACA 


87 


TCTGCGGATACATATAATAACTA 


GAATAAATAACTGTATTTTTT 


88 


TCTTACCAAAAAATGACG 


ATTTTCATTAATTTCCTCT 


89 


TCTGAAGAGCTTACCAAAAC 


GATAGCTAATTGGTCTGT 


90 


TCTAGATATACAAATGGAAATTT 


TAAAAGATGAGCTTCTCG 


91 


TCTAAAAAAGGACAAGTAAATG 


AATTTCAATATAGCGACG 


92 


TCTGATTCTGTCATAAATAAGC 


CTTGTTTGTCTTTACCTT 


93 


TCTGAATTTTCACGAGAAA 


ATTATCCTTCAAAGCTG 


94 


TACCAATTAGGTAGCTATAA 


TGTGTCATATAATGTAACCA 


95 


TCTGTTAATACAAAAACACTTCT 


TGATCTTAATTTTCGAG 


96 


TCTGGTCAGTCTAAAAATGAAG 


CCAAACAGGTTGATCT 


97 


TCTAGCCAGGAGGTATATG 


ATTTACATCAGACTGTGAC 


98 


TCTGAAACTATTAATCCAGAAA 


TTTATGGCCAATAACA 


99 


TCTACAAGTATGAACCATCAA 


TTTTTTAGTAGTTGTCAATT 


100 


TCTAAGGGGCCAAAAGTAG 


GTAAGCTGAATTTTCGA 


101 


TCTATTACTTTAGAAAAATTTATAGA 


ACGAGAGTGGTTATTGG 


102 


TCTGCCTTTTACTTTGGCA 


TTTCTTCACTCTTTCTAGAG 


103 


TCTATTTTTTCCTTGATCAT 


CGGCCAGTTTTTTCTT 


104 


TCTGGTGAAACCCAAGATA 


AACACCTGGTGGGCGT 


105 


TTAACAATTCATGGACC 


ACTATTTCTAATTGCTCTG 


105d 


TTAACAATTCATGGACC 


TGGTCCCGGTGCGCCA 


105d 


TCTCAAGGACCTCCCGGTG 


ACTATTTCTAATTGCTCTG 


106 


TCTCAAAATCAAAATTCACA 


CTTAGCAGATTCATCCC 


107 


TCTCTGGAGCCTTTTATTT 


TTTACTATTTGAAAATTGG 


108 


TCTGGTAATCGTTCAGATAAG 


TTTCATAGGAACTTGTATT 


109 


TCTATCCAGCAGATCAACT 


GTCCACACCTGCGACT 


109d 


TCTAAACGGGTTCGCTATG 


GTCCACACCTGCGACT 


110 


TCTGTAAAATTAGTATTCGCAC 


TTTACCTAAGTAATATTCTGA 


111.19 


TCTGTTAGCGTTGATAAGGC 


TCCCCGTCTTTTTTGT 


112 


TCTACAATTAAAAATCTCACTG 


GTCGTAATCATAAAAGCC 


113 


TCTAGTAAAATCAAAATTGTAACG 


TTCATAACGAACCATAAC 


114 


TCTAATCTTTTAATTATGGGTT 


TTTGAGTTCTAGCAACG 


115 


TTTCAATACTATTTAAAAGG 


TTTTTTATCTTCTTCTTGC 


116 


TCTACCGAGGAGCCATTAA 


TTTTAAAACCTGGTAAAC 


117 


TCTGAACAATCACAAAAAACA 


TCAGCTCGTACTGTTT 


118 


TCTATGGTGACGGTGCTGG 


GTCCTCCTCAATTGGT 


119 


TCTAGTCAGCCGGTAGGGG 


CTCTTTTATACGCGATG 


120 


TCTGGTGGAGCATTTGCTA 


GTTATTTGCTCGTTGTT 


121 


TCTAATAAAGATAATCAAAAAACT 


TTTCTCAAATGTTTTCAT 


122 


TCTGCTGCCACCAAGAAAG 


TTTCAAATGATCTACAGC 


123 


TCTACAACAAATGTAATGGC 


GGCTAGTGTCTGTCCG 


124 


TCAATGAATTTTTCATTT 


ACCATCTATTTTTACCCC 


125 


TCTACAAAATATCAGCGAATG 


AGAACCCGCACTCTCA 


126 


TCTACTAAGCAAGCAATGTC 


GAACGCAACGGCTGCT 


127 


TCTACAAAAGAATATCAAAATTAT 


TTTCATATCAAAAACTATCG 


128 


TCGACTAATTCGTTAAA 


TTCTTTATCTCTTAATGCTT 
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129 


TTTGAAATAGTATTGGAAA 


CACAACAGTTATTTTTTCA 


130 


TCTATATTTTCTATTTTTTATTATGT 


AGGCCCTTCTGAGTAG 


130d 


TCTAAAAAACAACTTCACAAC 


AGGCCCTTCTGAGTAG 


131 


TCTAAAAC AG ATATTG AAAT AG C 


AAATAATCCAATGGCTG 


132 


TCTATTAAATATTATCATTTGCA 


CTTTTCAAGCTTTTTCC 


133 


TCTGCTTTACGGAACCTTG 


AAAATGATCAGTTTGAGG 


134 


TCTACTATTTCTCAACAACAATAC 


TTTTTGGCTTMGAAAG 


135 


TCTGAAAAAAAGAGTAGTTCAAC 


CTTACGATACATTTTAAATTG 


136 


TCTAATCAATTATCAGAAATCA 


TTCTTTTTTTACTTTAGCG 


137 


TCTCAAGAGTATAAAACAAAAGAG 


CCATTGCAATCCAGCA 


138 


TCTGCTGTATTTACACTCGTC 


ATGTTTATGGCTTGCT 

/ » 1 A**J 1 1 1 / 1 1 >— ^ Vw/ 1 1 I 


139 


TCTGGCGGCAAGATAAAAT 


TTTTTGATAAATCCCC 


140 


TCTGATGGGTTAAAGAATAATG 


ATATGTGTATTCATCCTTT 


141 


TCTGATGTTGTAATTAGTGGAG 


TACTTCTATTTTTCCATCTG 


142 


TTCGAATTAAGAGAAAGA 


GTAATGCAATAAATCAAAA 


143 


TCTAGCTTTTTAGTGATTTCA 

1 1 / V V-J V_/ 11111/ IV 1 \^ff till \mft A 


GGATTTTAGTTTCGCA 


144 


TATACGCATAGTGGAAC 


CCCATTGATTTCGTCG 


145 


TCTGTTATTATCAGGGGCG 


TACCTCTTTCAATACCAC 


146 


TCTGTTAGTCGTTCTCCGA 


ATTACCGTTAGGTACTGTA 


147 


TCTGAGGAGCAAGAATTAAA 


GGTATGGTTAACAGAATC 


148 


TCTATTCTAACAAAAGCAAGT 


ATATACCCTAGACTTTTTGA 


149 


TCTAGTGGGCGTTCATGGA 


AGGAGTTTTATTGATGATAT 


150 


TCTGATACCCCTAATCAACTA 


AAATGATTGTGGAAAAA 


151 


TGCAGGAGCTGTCCGC 


ATCAAAGAAGTTGACATTG 


151 I onn 


TCTGTCCGCATTGGTAAAG 


ATCAAAGAAGTTGACATTG 


152 


TCTAACTGCTTAGAAAATGAA 


GTTAGATAAATTAACCAGTG 


153 


TCTAACAACTCCAGCA 


CCCTTTGCTTCGTTGT 


154 


TCTGGAAAGGTCAGTGCAG 


TTCCACAAGTCCGATT 

1 1 V-'Vw// \V_// \/ \Vw-J 1 V> W *w//\ | | 


155 


TCTATTTTATTTTCAGATGAAC 

1 >— ' l t \ 1 1 1 l I \ 1111 Vw// \ \w^/ \ | V_// 


TTGTTTGATTCGTCCT 


156 


TCTGCATCAGATGTTCAGA 


ACTACCAAACTGCTGG 


157 


TCTAGTGACGTTGACAAATA 


TTGTGTATTTTTAGTTAGGT 

1 1 »w* 1 V— ' 1 / V 1 1 1 1 1 / \ V 1 1 / 1 


158 


TCTATGACCATTTACTTCAATA 


GTGGATAAAATTCGAAA 


159 


TCTCAAACTATTTTGACGC 


CAGACTGACTAGGAGCT 


160 


TCTGATGAATATCTACGTGTCG 


GACTTGTAATTGATTCGC 

>— \ V 1 I *w> 1 / U \ 1 | VwJ/ 11 1 w \ — J W 


161 


TCTGATGAGGTGGACTATAACA 


GAAGGCACCACCACCT 


162 


TCTATTTTCTTGCTCTTAGTTG 


GTTGTATAGATGAGTTAATCTG 


163 


TCTGAAACTGTCATTCAACTTG 


ACGGTTTTTAAAGAATG 


164 


TATTTTTTAACAACAAAAAA 


TTTTTCTTTATCTTCTGTG 


165 


TCTCCAATTTTTATTGGTTT 


CGATTTTGTAAGAGCTT 


166 


TCTGCATCTTATACCGTGAA 


CGACGAAGCTATTTCT 


167 


TCTACAATTTATATTGCTTGG 


TAAGGCTTGCATTTTG 


168 


TCTGTTGGATTGATGTTGG 


TTTTCCTAAAAATTTTCC 


169 


TGGAAACAAATCACAG 


GGCATCTCCTAGCTTT 


170 


TCTGCAATAGTTTTTACTTTTTT 


TGATAAAGGTAGTTCTACAC 


170d 


TCTGGTTCTTATCATTTAACAA 


TGATAAAGGTAGTTCTACAC 


171 


TCTGCTAGACCCAAACAGT 


TTTTAGATGTTTTTGTGG 


172 


TACACTCATATTGTTGAAAA 


ATGATTGATAATTTTAAGC 

/ \ 1 Jf \ | | \ tm JI A 9 I At A 1 1 1 1 / \I WJW 


173 


TCTAATAGTACTGAGACAAGTGC 


TGCTTTTTGATATGCC 


174 


TCTGCTTATGTCGTCAATTT 


TAAAATAAAGTTCAGAAAAG 


175 


TCTGAATTACCTTCGTTTATC 


TTTCTCCCTTGACTTTC 


176 


TCTAAACATCCGATACTTAATG 


CTTTTTCTCAGATGCTT 


177 


TCTAATTATCCTTTTGCGA 


GACATTGAAACGGAAT 


178 


TCTGGACTACGCGGAGTAT 


TTTTATCAATGATGTTGA 


179 


TCTGCTATTGGAGCAGCTG 


CATATGACGCAAACGC 


180 


TCTGATAAAGAAGGGATAGAGG 


AGCCTCTTTTCTTGTT 


181 


TCTAAAGAAAAATCACAAACTG 


ACGATTATCAACAAAGTT 


182 


TCTCAAAATAATAAAAAAGTAAAA 


CATTCTTTTAAATACAAATC 
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182d 


TCTCAAAATAATAAAAAAGTAAAA 


GGGTTTGAAAGTTTTC 

w 111 vjf \i V/ Vn^J 1 1 1 1 


183 


TCAAATGGTCAATCTAGC 


TTTAACTTTAATTACTGGAAT 

iii/ w \V— / iii / \» \ i | / \ v_y 1 \^^-*/ w \ i 


184 


TCTAAGGATTCAAAAATCCC 


TTTTTTAATAAGCTTCGA 


185 


TCTGGGCAACCATCTACAT 


TTTTTTGTAAACTTCCTG 


186 


TCTCATTCACAGGATAGCA 


CTTAGATACATTGTTTTTTTC 


187 


TCTGGACGAGGAGAAGTATC 


CTTTCTTTTCTTACTTGC 


188 


TCACAATCTTCTCAAAA 


TTTATTATTTTTAATACTTGAA 

1 1 1 / V 1 1 f \ I 1 1 1 1 \ 1 / \\_/ 1 1 VJ/T/^ 


189 


TCTGATAAGTCAGCAAACCC 


CTTCAACTGTTGATAGAGC 


191 


TCTATCACGACATTACAGACT 


TCCTTTAGCAGGAGCT 


192 


TCTAGATATTTAACTGCTGGT 


GTTATACATGTTGTCTGAAG 


193 


TCTATAAAATATCAAGATGATTTT 


CCAAATAATAACACGTTT 


194 


TTAGAAGTCAGAGAGCAG 


GCTATCCCTTTCCAAT 


195 


TCTATTATGGAGACG G GTA 


TGTATTTTTAATTTGTTTTC 

1 1 * 1 llll l/Ullll \J 1 1 1 1 V— / 


195L 


TCTTTGAATAATAAAGGTGTCG 


TGTAI 11 1 IAAI 1 IGI 1 I'lC 

1 VJ 1 < \ 1 1 1 1 1 I \l \ 1 1 1 VwJ llll V_/ 


195LN 


TCTTTGAATAATAAAGGTGTCG 


CAAACTTTTAACATTTAATG 


196 


TCTATTTCCTCAAATTTTTACG 


ATAGTGTAAGCTACCAGC 


197 


TCTAATTTTTATAAGCTCTTG 

1 V-/ 1 / \/ \ 1 1 1 1 | / \ | / \t \^ — > ■ !_/ 1 i v»i 


GTCATCATATTCCTGAAA 


198 


TCTGCGCTTAAAGAATTAA 


TGTTCGGCGTAAGATT 


199 


TIM TAAAAGAAATTGAAA 


ATTGGTCATTTCTTGAG 

/i i I vj i w/ \ i i i v_/ i i vjnu 


200 


TTTC GTAA ATATA ATTTTG A 


AACAGATTTATTGGTTGG 


201 


TCTAGCGATACCTTTAATTTT 


AGArTOATCAArTTTTTrT 


202 


TCTATGCTGATTAAGTCGC 


GAACCCTGAARGGTAR 


203 


TGTGGTAAAACTGGAOT 


CCAATTGTATTTTTOAAn 


204 


TCTAAGACAGGAGCACCCGT 


ATTTATACTACCTGTTGAATC 


205 


TGCGAGTCAATTGAGC 


TTTAAATTTGTAGTCTTTAATA 

i i i / \/u \ i i i \j i nu i w i i i nn I / \ 


206 


TCTACAAATACTTTGAAAAAAGA 


CTCTTTTACTTTTCCAAAA 

\»/ i w iiii nu iiii \— / 


207 


TCTAATTTATTTAAACGTTCCT 


CCOTCCCTTAAGAGAA 


208 


TCTAAAAAGCGGCTAGTCA 


TTGACGATGTTGCATC 


209 


TCTGGACAAAAATCAAAAATA 


TTTCGAATTATTGTGACT 


209d 


TCTGGACAAAAATCAAAAATA 


GTATTGTTGTTGCCTG 


210 


TCTGGAGGAAAATTTCAGAA 


TTTTTGATTTCCCTTTC 

i i i i i vjn i i i v»y\-/ v_/ i i i v/ 


21 Od 


ICIACCICAIAICCI 1 I'/AI 1 1 


TTTATAGTGTGTTTGCAA 

III i \ f i\\J 1 \J t \J 111 xj/ \*si\r\ 


211 


TGTGGACATCGTGGTG 


TTTGCTAGGAACTTTGA 

ill \J v_/ | / \ v»> Vw^/ \y \ v_/ ill Vw^< \ 


212 


TCTAAGACTAAAAAAATCATCA 


TGATTCAATTCCTTTTC 

1 VJn 1 1 \-*/l/l 1 1 Va/Va/ llll «_/ 


213 


TCTAAACACACCAGTAAAGAA 


TTTTTCCTCTACTTTCTTA 

iiiii v«/\«/ i v> i r\v i i i v> i i i \ 


214 


TCTAAAAATAAAAAAATCTTATTT 


TTTGCTCACCTCCACA 


215 


TTAATAAAAGGATTATTGTCA 


CAATAACTTCTGTAAAATAAA 

vnn i i v/vv^/ i i v«/ i \_j i nn/vi i / v/v/ \ 


216 


TCTGCTCGTTTAATACCACA 


TTCACCCTTAAAATAATT 


217 


TCTAACACTAACATCCCTAGC 


TGCATTTTTCCCTTCT 

1 \JV/i IIIII X/W 1 1 V_/ | 


218 


TCTAGAGGGAAGGTTATTTAC 


CTCCAGTAAAGTATTAGTATTT 

V-/ i wrvvj i nnnvj i i \ i i / i / \ i i i 


219 


TCTATCAATAAAGTAACAGCTCA 


GTGAGGTTTTGGTAATT 


220 


TCTAGAACACTATTTAGAATGATAT 


TGCATATAAGTTTTTTAGC 


220d 


TACTATGCGAATCACAG 


1 GCATA 1 AAG 1 1 1 1 1 1 AGO 

i vwn i i \ i i\r\\m* i i i i i i / ivj\_/ 


221 


TCTAGTTTAGCATTGCAAAT 


CTCATCTAAAGTGCTATCC 


222 


TCTACATTTTATAAAAAGACGG • 


CTCGTATTTAGGCAACT 


223 


TCTAAGAAAATACGAAGCTATAC 


ATTGGATATGCCATAAA 

/ \ 1 1 V^Vp^/l 1 / 1 1 V^*\^V^/ \ 1 # \J\i \ 


224 


TCTGGAGGAAATGAAATATTA 


GACTTTTTGATGTTTACTTT 

KmJlW^ 1 1 1 1 1 wl 1 1 III / \ V-^ 111 


225 


TCTGGTATGTCTAATAAGGAAAT 


TTCTTTACTATAAACATCTTCA 


226 


TCTAACAAACTTATTACAGAAAA 


AGCATTTAAAGTTGAATGT 

/ lXJV/# \ 1 1 1/1/1* V I 1 \^Jl \i \ \ | 


227 


TCTGTTTCATATGAAAAAGTCC 


GTTAGTCTCTTCAAGATCA 


228 


TCTAGTAGAGGTATTTTTTTACAA 


AAGACCTACCGCCCAA 


229 


TCTGAACGTCGGGTAAGTC 


TACTTCTTTCTCTTTCAATT 


230 


TTTTTAATCGATTTTATTT 


CTTAGTGTTCCGATATGA 


231 


TCATTAATTATTCTTACGGT 


TCTTGTTTTAAGAGCAGA 


231 d 


TCTTTATACGTTGTTAAACA 


TCTTGTTTTAAGAGCAGA 


i 232 


TGGCTAAGTAAGCATGAG 


ATCATGTTTTCCCTCAA 


233 


TTCCCAGCTAGCTGTC 


ATCTGATATATCCGTTTTAT 
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234 


TCTATAG A A ATTG CTGTATTA ATT 


TTTTTTGTCTCCTTTTTTA 

1 1 1 1 1 1 VJ 1 \_/ 1 v_/ I I 1 I I 1 / \ 


235 


TCTATTCGATTTCTTATTCTTG 


AAAGACACGATAAACATAAG 


235d 


TCTGACTCAACCACAGTCTC 


AAAGACACGATAAACATAAG 




TCTGCAGACCTTACAAGTCA 


ATTTGCAACTTCTTGTATA 

n i i i \jwnnw i i v_/ t i \j i n i n 


237 


TCTATTGTATTTGCTATTGCA 

1 W 1 / \ 1 1 W 1 / V ■ 1 1 VJVy 1 / V 1 1 VwJ Vw// V 


TTTAAAAGTATCCTTAAATAAG 


238 


TCTGATATTTTTTCAGCTATTGA 


CTTCCTCCTCAATAGTTG 




TCTGTTAGTGCTGCTATTGAA 

i v 1 \3 i i nw i w w i w w 1 n i i unn 


TTCTC CTC C CC C ATT A 

I i w i ww i wwwwwn i i n 


240 


TCTAAGAAGCTTACTTTTATTTG 


ATCCAAACGAGTGAAAT 


241 


TCAAAAGGATATTCAAGA 


AGGIGI IGI IGTAI I I IC 

nwvj i w i i vj i i \j i / \ i i i i w 


242 


TCTCATAATATATTAAGATTTTTAGG 

i w i vn i nn 1 1 \ i / V i i nnvjn i i i i i r\vj\j 


CTTTCTAAGTTTATTAAACATA 

w i i i w i nnvj i i i n i i / i n 


243 


TCTATTCTTGGTCAAGATGT 

i w i / \ i i w i l v — i v — » i unrvvjn i vj i 


GGCATCTGTTACCTTG 

vjwun i w i y~j i i nww i i w 


244 


TCTCATGAAAATGTTAAAAAAG 


AAACAACTCCATTATTTTT 


94^ 


TCTAAGTCAACGGTAACAAA 

i w i nrvvj i wnnwww i rv^orwA 


TAAACGTTGAAGAGCAT 

i nnnuu i i unnunuvn i 


94R 


AGGAAACGTTTTTCCT 


CTTATCATATCTTGTTAAATCA 

w i i n i wn i n i w i i \j i i nnn i wn 


94firl 


TCTAACCATAAGGGAAAAGTA 


CTTATCATATCTTGTTAAATCA 

w i i n i wn i n i w i i \j i i nnn i wn 


947 


TCTGCTAAACAATTAATTGGT 

1 W 1 UU 1 /A/AnW/An 1 1 / VTA 1 1 V_JV_J | 


TTGCCATGGGTTATAG 

i i vjwwn i vjvju i i n i nw 


9.4ft 


TCTTTGATGGTGTTGTTATTC 

1 w 1 I 1 \-Jrt l WW I W i 1 W 1 1 /A J 1 W 


AGAA 1 1 AAAA"I 1 1 1 C^A 1 GC 

nunn J i nnnn i i i i wn i ww 


248d 


TCTAAAACTTATTTGTCAAATG 


AGAATTAAAATTTTCATGC 


949 


TGGGCTTACCATACTG 

1 ^jvjvjo I I nw wn i r\U I W 


TTTTTTAGATGTTTTATGTG 

i I I I i i nun i w i i i i n i w> i \j 


9^0 


TPTG^iPPTTAATPTTAAGP 
1 w I uvjUu i | /An lul I nnOU 


PTPTTTT A P.TTT A GPTTP A 

w 1 w 1 1 1 1 nw 1 I 1 nwJw 1 1 wn 


9M 


TPTP A AT ATTTTTTG A A A P A A ft 

1 w 1 w/An 1 n 1 1 1 1 1 1 wnnnw/Anw 


TTTPAAAPTPPAGPPA 
l l i wnnnw I ww/Awwwn 


959 


TTTATTTPAGGTTATATPAA 

ill /a ill wnww 1 mini w/An 


GGAGTGCCTTTCTACT 

wwnw f www i i f w i nw ■ 




TPTGAAAATTGGAAGTTTGP 
1 V-/ 1 wnn/A/A 1 1 uunno l | | ww 


TTPATATPGTAAAGPATP 
l i wn i n i ww i /An/Avjj w/A i w 


954 


TCTATTGAAAAGGGAGTTG 


ATCGTCAACCTTAACG 

n i ww i wnrAww i i nnww 


9^R 


TPTATTf^TTGGTAGAf^ AAATPA 

1 vlnl 1 W 1 1 WW 1 nOno/A/An 1 w/A 


TTTTAPTTGArPTrTPAr 


9SR 


TATPATGTAAAAATTGATPA 

1 /A 1 Vjr\ 1 vJ 1 r\r\r\r\r \ I 1 wn 1 wn 


GTPTTPPATTAATATTPPP 

w 1 w I I w wn i i nr\ I n I i www 


9^7 


rPTGATl 1 1 1 I'ATAPAAAGGAGG 

1 w 1 \jn 1 1 1 1 1 1 /A 1 nvr\r\r\vj OrA w W 


PPAATTATTTTGAAAGTTP. 

wwnn i i n i i i i w/Annw i i w 




TPTGAAPGTTATAPAGATAAAATfi 

1 V_/ | >jnr\vVj 1 1 r\ 1 n WrA WVA I /A/A/A/A I w 


Al 1 1 1 1 1 1 GAATAA 1 A 1 AATCO 

n j i i j i j i wnn i nn j n i nn i w w 




1 U 1 W 1 1 IOI vV3 1 nnnnn/AWnw 


TTTATTATPARAAAAGGC 
l I I n i I n i wnwnnrArAwww 


260 


TCTACTCTTGTCTTAGTTGTTTAT 


ATTCAAAAAATTTTTCAA 


9R1 


TPTATAAAGAAAGCTGAAAATP 

I O | r\ 1 n/Anw/A/AnW W 1 wn/A/An 1 V-/ 


PRAAACGTPAGGTAAA 

wwnnnwvj i wnw^v? i nnn 


269 


TCTATAAAAAATGCTATAGCATA 

1 W 1 rA 1 / A/A/ATA/ArA 1 VJv 1 n 1 f\\J V_//V | / \ 


ACTTATTTTTGATAATATTTCTT 

y \\~/ i i / v i i i i i wn i / V/ \ i / \ i i i \_/ i i 




TnTnAGrnTTnTAAArTArTTn 


ATCAGCATTTPTACGAA 

n i wnw^wn^ i i i w 1 nww^nn 


9R4 


TPTGATTTGTTTAGPATGTTG 


ATGTAGAPTP.PTAATGATTT 
n l w l nwnw I ww i nn I wn I I I 


9R5 


TCTOTTGCTTOOOTGATTT 


TTTACTGTTCCTTTCGC 

i i i nw 1 \J I 1 WW 1 1 1 WWW 


266 


TCTCATCAATCAAATCATTATC 


GAGATTAATTTGATTATATTTT 

Vw^/ w^ji \ i i / \i \ iii \p^ri i i 1 \ i / \ i i i i 


9R7 


TrTATPTTTATTATCGGACAA 

1 v in 1 w 1 1 ini l rA I v-»\j\jnonn 


AAPATCATTTCCTCCC 
nnon i wn i i i ww i www 


9fift 


TCTAAAGAATTTATTAAAGAATGG 

1 I rA/ArAxjnn 1 1 1 n 1 1 /A/AnVJrAn 1 vv 


GTTGATAGTTCCAAAACG 

w I I vjn i n\j i i ww/Annnwwj 


9RQ 


TCTGCAGATGATGGTGGTT 


TAAATGTGTTCCTACTAAATT 

i n/An 1 w 1 i i ww i nw ■ nnn i ■ 


97f) 


TTAAATGATGCAATAACAA 


CATCAATAGCCGAGCTG 

wn i wn/A i nwwww>nww i w 


971 


TTGCTGGATTATCCTC 


TTTATTTTCCAAATGACA 

1 i i n iiii v/w ^ i vjnwpx 


272 


TCTGTATTTATGGCAAATAAGA 


TTCACTCGGAGTTGGAG 


979H 


TCTATGAGTTCTCTGGAAGTT 

1 »_/ l n 1 On\J 1 1 w 1 W 1 \JUr\r\U 1 1 


TTCACTCGGAGTTGGAG 

1 i wnw i wwwnw i i wwnw 


273 


TCTGGTGTCCTCAACTCTG 


AATGTAAATGACAAAGGTA 


274 


TCTGTTCATGATTTTGGTGA 


GTTTi l l'AA'IGGI 1 1 GC 

\j I I I I i i nn i v_jw i i i wv 


275 


TCTGGGGTTTGGTTTTATA 

i Vw/ i i i i njv till / \ i n 


TTTATCATAAGCATCTAGAC 

1 I I J \ I v«/r» i / « * w\j/ \ i \_/ i nvjnu 


97R 


TCTCAATCAGACATTAAAGCA 


PTGATCTCTTGTTGATGC 

w I wn i w i w t i w i i wn i ww 


277 


TCTATTTGGAGGGGGGAAA 


AAGCAGGGGAGCAATA 


277ri 


TCTACCAAATTTGACTGGG 


AAGCAGGGGAGCAATA 


278 


TCTGTTACGI 1 1 ITCI IAT 


CTGAGCAACACCTGTC 

v / | \jr\\^wi vi Ww// vw w i i \w/ 


279 


TCTAAAAAGAAAAGTTTAATTAGC 


GGCAATTTTGTGGCAA 

vjUwru' i i i i / i wwwnn 


280 


TTTGATTTTTTTAAGAAAA 


TTGCTTAGTTAATGGCT 


281 


TCTAAGAAATTAATTATAGGTATTT 


AGGCGTTGAATATMTTC 


281 d 


TCTGGTTTTTCGTTTTTGA 


AGGCGTTGAATATAATTC 


282 


TCTCTATTCTCAGATGAAACAA 


CTTTTCAACTCCAAACA 


283 


TCTGTTAAATTAAAATCGTTACTG 


GAGTTGTCTTTTTTTGTC 


284 


TCTATGCAACGATTAGGAC 


GCAATCACAATTGACAT 
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285 


TTAGGTGAAAGCAAATC 


CTTTGTCTGCTTCACTT 


286 


TCTGGAGGATTTTATATGAAAG 


TTGTATCTTCTCCTGACC 


287 


TCTGCACACACACCTACTAGT 


TTGGTTAATCGTCTTG 


287d 


TCTAACAATCGTTCAAAGC 


TTGGTTAATCGTCTTG 


288 


TCTAAAAAGTTTTTAAAAGTTTT 


TTTAGTTACTTTCATAAATGG 


288d 


TGGAATAATCATCAGTCA 


TTTAGTTACTTTCATAAATGG 


289 


TCTCAATCTAAAGGGCAAA 


ATATAATTCCTCTAAAACTAGC 


289L 


TCTCAATCTAAAGGGCAAA 


CCACTTCAAATTAACTAAC 


290 


TATTACTTATC AAAAG AAAAG G 


ATTCCTTGAACACGAA 


291 


TCTCAAGTATTAAATGACAATGG 


GTGCCATTCATTCTCT 


292 


TTG AATCGTAAAAAAAG G 


TTGTCCTGTGAACTGTG 


293 


TCTATGGGTCTAGCAACAA 


AGGGTTTATTTGTTGAAG 


293d N-term 


TCTATGGGTCTAGCAACM 


TCCTGATTTATCCACTG 

1 ' ' 1 ^-*# 11 1 I'll \ \_/ | ' 


293d C-term 


TCTGTTACAGCTAAACACGG 


AGGGTTTATTTGTTGAAG 


294 


TCTGGTCATTTTAGTGAAAAA 


CAAAATACCTAAGCTAGC 


295 


TCTAGCGACATAAAAATCAT 


ACGAACTTCCATAACC 


296 


TCTAAAGGTATTATTTTAGCG 


GGCTTCTCCAATCAAA 


297 


TCTATTCAGATTGGCAAATT 


TTGAGTTAATGGATTGTT 


298 


TCTACTAAATTTATTGTTGATTCA 

1 \-f 1 / 1 / XI \l 1 1 1 1 / 1 1 1 1 1 11 1 \ 


TAGCGTTATTTCACTGTG 


299 


TTTGAAATACTTAAACCTG 


TTTCTCCGCCCAGTCA 


300 


TCTGCTTCTACAAATAATGTTTC 


CCGTTTATTCTTTCTACTG 


301 


TCTGTAATTAATATTGAGCAAGC 

i \mS I v_i i / \j i i i f \j \ i / v i i \ — > I \ \_// \i low 


CATATCTGTTGCATCAAT 

W/ 11/11 | | 1 \J 1 1 1 \l 1 1 


302 


TCTGAAATCAACACTGAAATAG 

1 V— i' 1 1/1/11 V^/1/1V-// \\J 1 *w^/ \7 W \ 1 / \ ^ — ' 


AACTGGCTTTTTAGTCAG 

/ 1/ \ / 1 \J\Jv 11111/ \ 1 Vp^/ \ *— J 


303 


TCTACAAGGCATATAAAAATTTC 


TTTATTATTTAATTCTTCAATA 


304 


TCTAACGAAATCAAATGCCC 

i \-/ i / i# i\-/ vjnnrx i \-// i/ i/ i i i_/ w w 


GTCTTTTAGAGCATCGA 

' 1 V-*/ I I I | / 1 \^J/ \ V-// l | Vw^/ \ 


305 


TCTGGACGAGTAATGAAAACA 


CTCTCCTCTAAGACTTTCG 


306 


TCTGGGAAAAAAATTGTTTT 

1 V-*' 1 N_J V^*/ V/ 11 w U U U 1 I 1 till 


TCCTTTTGTTACTTTTGC 

1 V/Vp* 1 1 1 1 V-' 1 1 § \\S till N^-^ 


307 


TCTAAATTTACAGAACTTAACTTAT 


TTTATCGCCTTTGTTG 


308 


ATGACACAGATGAATTTTA 


ATGTTCAGGTTCTCCG 


309 


TTGCAACTTGGAATTG 


TTCCATTATCTTCAAGTTA 


310 


TCTGCTAAAGAGAGGGTAGAT 


CTCTTCTTCATTTTTCTTA 


311 


TCAATTATTACTGATGTTTAC 


TTTTTTTAAGTTGTAGAATG 


312 


TCTACTGCAACTAAACAACAT 


GTTTTTTGATGCTTCTTG 


313 


TCTAAACGTATTGCTGTTTTA 


TTTACTACTTTGGTTGGC 


314 


TCTAAATTTTATCTTGTTAGACAC 


GTGTGTCATTTTGACCT 


315 


TCTATAGGGGATTATTCAGTAA 


TCCTTCAAGATCATTTAA 


316 


TCTACTGAACGAACATTCGA 


ACCTCCTTTTCTTTCATT 


317 


TCTAATAAGCCATATTCAATAG 


ATCTTCTCCTMCTTACCC 


317d N-term 


TCTAATAAGCCATATTCAATAG 


ACTAGCTAGATTCTTAACGC 


317d C-term 


TCTGACTTGAATGGCAATAT 


ATCTTCTCCTMCTTACCC 


318 


TCTATTGATTTTATTATTTCTATTG 


GCCTCTTTCTCCAAAT 


319 


TTAAAACATTTTGGTAGTAA 


ATGTCCTGTTATATCTTCTT 


320 


TCTACTATTTATGACCAAATTG 


GCGTTGAATAATGGTT 


321 


TCTAAAAATAAAAAAGATCAGTT 


TATTTCTTTAGTTTCTTCAA 


322 


TCTCAAGAAACAGATACGACG 


TAATAAAAATTATATAAGAACCT 


323 


TCTGGTAATGAGTCAAAGAAC 


TTCTGTCTTATAAGCATAAG 


324 


TCTGGAAGTAAATCAGCTTC 


TTTTTTATAAGCATGTGTA 


325 


TCTGCTTGGCAACTTGTTC 


ATGAGACATAAGGTCTTG 


326 


TCTGGCATCTCAGACTTACC 


GTTGGAGCTCCTACTG 


326L 


TCTAAATTCAAATCTGGGG 


GTTGGAGCTCCTACTG 


326L N-term 


TCTAAATTCAAATCTGGGG 


CATTTCTTTGGTTAAAGC 


327 


TCTGGAGGGAAAATGAATC 


TATCTCGAGTGCTATTTG 


327d N-term 


TCTGGAGGGAAAATGAATC 


CTCTTCATCGACATAGTAA 


327d C-term 


TCTGGCAACTTCAAAGCAT 


TATCTCGAGTGCTATTTG 


328 


TCTGACCMGTCGGTGTCC 


ATTTTACAGTAGTGGAGTTT 


329 


TCTAAATCAAAGACCTCTTCTA 


TGTCCTCATTTTTTCA 


330 


TCTAATAAACGCGTAAAAATC 


TTTAACAGTACGAACACG 
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331 


TCTACCAGAACAGTAGCAAT 


CCCCC I GT ITTTAAAAT 


332 


TCTACAAAAAACCTGTTATTAA 


ACCCTCATATGATTCC 


333 


TCTATTGATATACAAAAAATAAAA 


TTTAAAATAATGATACATCTC 


333d 


TCTGGATCATTGAGGGCAA 


TTTAAAATAATGATACATCTC 


334 


TCTAATTTAGTAAAAGTGAATAGTG 


TAACCCCGTCTCAACA 


335 


TCTGAAGAAGAAAAATATTTTGA 


TATTTTCGTTTTCTCAAA 


336 


TCTCAGGTTGAAGTTGACTTA 


TTTCTCCAAATMTCTCTC 


337 


TCTGAAACAGATTCGTTTGTA 


CCTACTTTTAGTTTTAGAAGA 


338 


TCTGCTATAATAGACAAAAAG 


GAAATCATAGCTTCCC 


339 


TCGAAACCGATTAAGAT 


ACCTTTTACTTTTGGTAGT 


339d 


TCTCAAGTCATGCGCTATG 


ACCTTTTACTTTTGGTAGT 


340 


TCTGGATTTCTCTATAATTACTTC 


TTGTTTGTGAAGTAAAACG 


341 


TCTGGAAAACCATTGTTAAC 


TAATTTAAAAATTGCATAAA 


342 


TCTCAGAAAATTGAAGGTATT 


TTTCGTTACCATATCTAGA 


343 


TCTGAAATGCAAGTTCAAA 


TAAATCATGGAAACTAGC 


344 


TCTGCACAACGCAGAATGT 


AAAGCCCAACCTTCCG 


345 


TCTAAAAACCTGAATTGGG 


GTTTCCACGTCCTTTC 


346 


TCTAATAAAATAGCTAATACAGAAG 


AAGTTTATTCAAATCTGG 


347 


TCTATTGATATTCATTCTCATATC 


AATGTAATGGTTTTTTAATA 


348 


TCTACTGGATCTAAAAAATTAGC 


AGCTAAAATACCTAACCAG 


349 


TCTAAAGATCGCTTATATAATAAA 


ATTTTTTAAACGACTCAT 


350 


TCTGCAAAAGATATAATTAAGGTT 


AGCGGAACGGTGAATA 


351 


TCAGAAGATCAAAAACA 


ATAATCTAAACTATCAGCTCT 


352 


TCTACTTTTTTTAAAAAGCTAAA 


ATCTCCTATTGTAATTTTGA 


352d 


TCTGGTACAGATAGTAAATTTGG 


ATCTCCTATTGTAATTTTGA 


353 


TCTACAATGTTAAAAATTGAAA 


CACCTCTTTTGTCAGA 


354 


TCTATTAAAGAACTAAAAGAATTT 


TTTGTTAGCGAGTAAGTC 


355 


TCTCGCTCACTACCTT 


TTTATCATCCTCCTTAATAA 


356 


TCTAAATTCTATATTATTGATGATG 


AAACGTTTTACTCTGTAAAA 


357 


TTGGAACATTTTTATATTAT 


AAATAAGAATGTTAAAAGAGC 


358 


TTTTATACAATTGAAGAGC 


TTCCCCAAAAATTTCT 


359 


TCAAGAAATAATTACGGT 


ACGCAGTCCCATTTTC 


360 


TCTATAATGAAGGCGGTCT 


CTGGCATGAGGTCTCA 


361 


TCTAGCGTATATGTTAGTGGA 


CCTTTTTTCAATAATAGC 


362 


TCTACTAAACCACAGGGGG 


ATCTTTAATCTTACCATCC 


362d N-term 


TCTACTAAACCACAGGGGG 


TGCTGCTACTGCAATG 


362 C-term 


TCTGGTAATGAAGGAAATATCAC 


ATCTTTAATCTTACCATCC 


363 


TCTCTCGAATTAAAAAATATTG 


TAAATTCCTTTGTTGTAATA 


364 


TCTAACTATATGGGTATGGGC 


ACCATCAGTTGTCACC 


365 


TCTGGAACTGCTACATATAGTAGG 


TATTGACCAGTGCACG 


366 


TGGCTTGACATTATTTT 


TTTTTTTGAATTTGTAAAAG 


367 


TCTAAGAAATTAAAAATATTCCC 


AGAGATTATTTTTATTTTAAAT 


368 


TCTAAAATCATTATTCAACGT 


TTTATTTTTAGTATCTAAAACG 


369 


TCTAGTAGAATGATTCCAGG 


TTTAGAAACTCCAAGTATCTC 


370 


TCTACCGAATTTAATGACG 


GTTAATTTGACTATTGATATATT 


371 


TCTAAAGATAGATATATTTTAGCAG 


TAAACTCTCAAAAGCTAAAC 


372 


TCAGAAAAATATTCCACT 


ACGTTCTTCTCTGGCT 


373 


TCTGAAATTGGTCAGCAAA 


ACTTAAATGGMCAACC 


374 


TCTAAGTTCGAAAATATAATATATG 


TTTGCCTAAAAAATTAGG 


375 


TCTGAAAAAGAAACTATTTTAAGT 


GGCTTTCCTCCCTTCA 


376 


TCTAAAGAAAAGAAAAATTTGG 


TTCATCTTTTTCAATATCA 


377 


TCTGGTAATAAACTGATGTATCA 


GTGAGAGTGTCTTTGTTT 


378 


TCTGAAGATCAACTCACTATATTT 


CAGATTTTTAGCTACTTGTC 


379 


TCTCAAATTACCCGAGAAG 


TCTAGAGCGCTTTATAAG 


380 


TCTCTTAAAAGATTACTTACTGAAG 


TTTTCTAATAGTTAGMGCC 


381 


TCTCTTGGGATAGCTCACA 


TTTTAAATGTGCAGAGA 


382 


TCTATAAAGTTTAAATTA I I I I fTAA 


ATTTATAATTTCCTTGGG 
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383 


TCTATTTTACAGACGAATATACTAT 


TCTATAATATCTCTCTAAAGTGA 


384 


TCTAGAATAATTGTTGTCGG 


CCTCGCTAACATATCAC 


385 


TCTAATGTAAAAAAACGC 


AGCTCTTACAGTCTTGC 


386 


TCTCTAGTATCAAAGGAGAAAGC 


TTGTCTGAGTGACCAA 


387 


TCTGGTATGTTGTTAGCA 


ATAATATGAAATATGTTGTTCA 


388 


TCTCTTATGATAATAAATTCATTCG 


TCCGCAGAGTAAAAAA 


389 


TCTATGAATAGTGAACATAAAATT 

J \^ 1 / \ 1 \J/ \J \ 1 / \ V_/ 1 1/ W// \I/V1/1/\I 1 


TTCATAAATGTGCCAA 


390 


TCTAGGGAAACTTACTGGA 


TTCATCTCTGCTCACC 


391 


TCTAAAAAAGTCATCGATTTAA 


TTCTCCTTCAGCTTTTA 


392 


TCTATTACATATGATTTCACAAG 

i * i / \ i it \\~/t \ i t \ i 1 till \tSt »* » ' 


GTCATTTTTTCTAAAGTTTG 


393 


TCTAATAAATCTTGGTTGAGM 

1 V-/ 1 / U \ 1 / If 1/ 1 1 V-/ 1 1 V_J 1 1 V«J/ VWI u I 


TTTTTGTAGTTGTTTCAAT 


394 


TCTCCTATGTTGTCTGTTGG 


TTTCATTAGATAACTATTCAGC 


395 


TCTACTTATCAAAAAACAGTTG 


TATAGACTGAAGATAATTAATTAA 

1/11/ 1^-*/ IV/ 1 V-// 1/ IV-// ll/ull l/l/ll l/ul 


396 


TTTGTCAAAGGGATTT 

III V* 1 V^/ \t V/ 1V>JV>/V-J/ 1111 


AAATCGATTAATCAAGTC 


397 


TCTAAATTATTTGATAAGTTTATAGA 


TCTAAAGTAGTCCTTTAGACTA 


397d 


TCTAAAACTGCTACAGTTAG 


TCTAAAGTAGTCCTTTAGACTA 

1 1 / U U 1 / l\J 1 W V III/ ISJ/ I \— ' 1/1 


398 


TATTTAGAACAATTAAAAGAGG 


TTTGTCCATAATCATTTC 


399 


TCTAAAGTTTTAGTAGTTGATGAT 


GGTAGATATGCCTAACATT 


400 


TCTAAAATAGTTGAAGGCG 


GTTTCCTTCCAAAAAA 

III V/ V^/ 1 1 *— ✓ \-J I \l 1/ If If U 1 


401 


TCTGGAATTGAATTTAAAAATG 


TCCATGCTTAATAGCC 


402 


TCTGGAAAATATTTTGGTACAG 


ATCTAAACCAATTTCTGTAC 


40^ 


TCTfiAGGTTAGAATGGTAACTC 


GTCCACAAAAACGTCT 


404 


TCTAAAATAGATGACCTAAGAAA 

1 \~t 1 /i/v/i/i 1 / Vorv 1 un\jU 1 /\/\ v_J r\f\/\ 


TAGATGTTCTACGGAGAA 


405 


TTGAAAATTCAGTATTATCA 

1 1 VJ/ 1/ 1/1/ 11 1 V— '/ V VJ 1 / \ ] I # 1 I Vw// 1 


AAAGATGGCAAGCCAT 


40R 


TCTttATAAAAATAATTTAGAAGACT 


TCTCTCTCCACACCATA 


407 


TCTAAAATTGACATGAGGAA 


CTTACCTCCTGTGGCT 


407d 


TCTAAAATTG AC ATG AG G AA 


CTTTTGTTGGTTACCTC 

V/ 1 1 1 1 V^/ 1 1 1 1 / 1 W v_y | V_/ 


408 


TCTAACCACTTACTTAACCTCA 


TATTGTTAAATATGATGAAATG 


409 


TCTAAGGTAGTAGTAGCTATTGAT 


ATGATTATACAAATTGATTAAT 

/ \ i v^ / ii i / \ j / \ v// u i/i i i x_/f ii i / \J \ i 


409d 


TCTACTGAAGAGAGAAATCCT 


ATGATTATACAAATTGATTAAT 


410 


TCTGCTTTATTATCAGTTATTGTC 


TCCCTCTTCCTTGACA 

i w v i v/ i i w i i Vp<" vw/ \ 


411 


TCTAAAGACTATATTAACAGAATATT 

1 | f \J \t \ \^Jl \ W 1 f \ 1 t \ 1 1 / If W/f V >w// 1/ 1 1 M 1 4 


AACGTTTTTGAGCTTT 


412 


TCTGGATTTTTTGCACAGC 


TTTTGTCTTAAACGTTCT 


413 


TCTATTGTTGGTGAACAAGA 

1 ^/ 1 / 1 1 1 1 1 1 v** / V/ IflV*// 1 


TTTAGATAGTCTAGCCATTT 


414 


TTAAATCAATATTTTCTGC 

i i * u i/ \ i v// u i i / i i i i i v> i v^i 


ACGGCTTGGGGCAGAG 


415 


TCTGAGCGAATTCCTGTTC 


TACCATTATCCGTGCT 


416 


TCTGAAGTCATTCGTGAACA 


ACTATTAAACTCCAATGTTA 


417 


TCAAAACAATATGATTATATC 


GCGCATTGTAACAAAT 


418 


TCTAGCAAGCCTAATGTTG 


TTTTGGTAAAAGGTCTG 


419 


TCTGATTTAAATAATTACATCGC 


TCCTGGAAAGTTCATC 


420 


TCTAAACGTGAATTACTACTCG 


TAGTTTATCTAAAGCGTTC 


421 


TCTATACGCCAGTTTTTAAG 


TTTATGTATAGAAACAGCAG 


422 


TTTTCGAGCGATTTTG 


AATGTACATAACMTAGAGAGC 


423 


TCTGTAACCAAAGTTGAAGAG 


CAACGATCCCAAGAAC 


424 


TCTATGAAAGATTTTATTGAATG 

■ Vi/ 1 / 1 J x_J / V/ V \V»//i 1 1 1 1 / % 1 1 V_»J # ■* » ■ 


GCCATTCTTACCTCCT 


424d 


TCTATGAAAGATTTTATTGAATG 


ACGTTTTTTCTGACCG 


425 


TCTATAGCCTTTAATAGTTTATTT 


TATAAAATAAATTTGAAGATCT 


426 


TCTD440ACAGTTTATAATATAAACCATG 


ATCATCTTGTACCAACTC 


427 


TATTCTTTTGAAGAACTTTT 


GCCAATAAATTCACGG 


428 


TCTATAAAAATTTTGATCCC 


AGTCTGTTTTTTAACAAAAG 


429 


TCTAATCATTCCATTGAATC 


TGGTTTTAGAACAACTTTA 


430 


TTACAAAAAAAATATCGG 


AATTAAGCTGAAAATGAC 


431 


TCTGCGGCTCAATTAGCTG 


ATTATATTCTTTTAATTTGTCA 


432 


TCTCGTACCTTCAAACCAG 


CTTACGACGTCCTGGA 


433 


TCTATTAAAGCAACTTTTACTC 


GTGTGTCATGACTACTGTAC 


434 


TCAATTTTTCAGACAACA 


TGAGTAGAGCACAAGC 


642 


TCTAGAAAACGTAATGATACATT 


GAAACGAATACGTTCTT 
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643 


TCTGATTGTCAAATTACACCA 


ACTACCTACCGTTTTCAC 


644 


TCTATTTTTCGTGGTGATAA 


TTTGATGGTAACAGTCG 


645 


TTTTTTAATATTGAATATCAC 


AGAMGGCGCTCTTCT 


646 


TCTAAGGGAGTCCAATATATG 


TATCTTTAATAAAGCCCTA 


647 


TCTCGTCGCATGAATACCA 


CATCCCATAAATTTGTT 


648 


TCTATAGAATTTTCAGGGC 


CAAGACATTTCTTAAAGC 


649 


TCTGCTACTCACTCTAACTCAG 


TTTTGTTTTAGCGATG 


650 


TGCTCTTCTTCAAATACT 


TTTTAAACCATGCTGT 


651 


TCTCTAACACCATTTACAAAAG 


TTTGTAAAGACCTTCTTT 


652 


TCTCAACAAGGTATTATGGATA 


TTCCTCGTTTATTAATTT 


653 


TCTAAAATTTTAGGTACACCA 


AAAGAAAAGATGTGCC 


654 


TCTGGAAAAATGGTTAAGM 


CTGTGCAGGCTCAAAT 


655 


TCTAAATTCGTCCGAACCGT 


AATTGTCCAGTCTAAGTTA 


656 


TCTGGTCTTCCAACGCAGC 


ATTTAGTGTTATTTCTCCTG 


657 


TGCTCAGGTAAAACAT 


TTTTTTAAGTGATGATGAA 


L 658 


TCTGAAAGCAAATCTTTGC 


CTTTGTCTGCTTCACTT 


659 


TGTGCTAATTGGATTG 


TTTTGGGGTTACTTTAC 


660 


TGTGGAAATGTCGGAG 


TTTTGCTGAAATAATGTT 


661 


TGTCAGTCAAACCACA 


ATCATACGMTGCAAC 


662 


TCTGCTAGTTTTTATTTTTTCC 


TTTTTCATATTTTTTCAAA 


663 


TGTGGAAGTAAATCAGC 


ATTATTTTTATAAGCATGTG 


664 


TCTGTTAAATTAAAATCGTTACTG 


GAGTTGTCTTTTTTTGTC 


665 


TCTATTGCTGGTCCTAGTG 


GATAAGCACTTTCCTTAA 


666 


TTATTTTTTGGAAATTGG 


GCCTAAAAACCAATCA 


667 


TCTGCTGTATTTACACTCGTC 


ATGTTTATGGCTTGCT j 


668 


TTTTATATGAAAGAACAACA 


TTGTATCTTCTCCTGACC 


669 


TCAATTATTATTGGGTTAA 


ATATACCCTAGACTTTTTGA 


670 


TCTCCTAAATTAACCCTAGTCT 


GGCTTTAAAGTTCGATA 


671 


TCTAGTCTTGCGAAGGCAG 


TTTATCGTAAGCACTTAGG 


672 


TCTGTATTTACACTCGTCTTACA 


ATGTTTATGGCTTGCTT 


673 


TCTGGAGGATTTTATATGAAAG 


TTGTATCTTCTCCTGACC 


674 


TCTGTTAAATTAAAATCGTTACTG 


GAGTTGTCTTTTTTTGTCT 


675 


TCTGGTTCATCAGACAAACA 


TTCAACTTGATTGCCA 


676 


TCTGTAGTTAAAGTTGGTATTAACG 


TTTTG C AATTTTTG C 


677 


TCTGTATTAGAAGTACATGCTGA 


TTTTAATGCTGTTTGAA 


678 


TCTGAGACACCAGTAATGGC 


TTTTTTAGCTAAGGCTG 


679 


TCTGCTMCAAGCAGGATC 


TTTTGCTAAACCTTCTG 


680 


TCTAATAAGTCCAGTAACTCTAAG 


ATTCATATTAACACGATGC 


681 


TCTGCTTTTGATGTAATTATGC 


TTTGCGTTTTGGAGGG 


682 


TCTATTAACTATGAGGTTAAAGC 


TGCACCTTGATGGCGA 


683 


TCTGTAATTGTTGAACTTAGTTTG 


CCATAATATTTGATGCTG 


684 


TCTCTTAGGAAGTATAAGCAAA 


TTCTAATCCTACAGCATG 


685 


TCTAAAATTTGTCTGGTTGG 


AAAAATTCCTCCTAAATTM 


686 


TCTGACTTTTATGATATCAATCTT 


AAAGTTTTGACTATTACTGATAG 


687 


TATGCTATTATGCAAAAAG 


TGGGGGAGATAGTTATG 


688 


TCTGCAATCGTTTCAGCAG 


TTGACAGAAAGCTAATTG 



TABLE III - RESULTS FOR in vivo GBS CHALLENGE 



GBS# 


% survival 


Pre-immune 


Post-immune 


1 


18.7 


22.2 


4gst 


19.4 


37.2 


4his 


25.0 


75.0 


8 


14.3 


42.1 


10 


29.1 


36.0 


15 


30.0 


60.9 



GBS# 


% survival 


Pre-immune 


Post-immune 


110 


11.1 


30.0 


113 


17.6 


29.4 


114 


40.0 


52.2 


117 


27.8 


36.8 


119 


36.4 


52.2 


139 


23.1 


26.7 
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16 


33.3 


53.8 


18 


29.4 


50.0 


21 


5.9 


10.0 


22 


3fi R 


63.1 


24 


vU.J 


41 4 


25 


28.6 


85.7 


32 


20.0 


25.0 


35 


0 0 


17 fi 


45 


2fi 7 


37 5 


48 


20.0 


25.0 


52 


14.2 


17.3 


53 


23 R 


29 2 


54 


22.7 


44.0 


55 


50.0 


52.9 


57 


33.3 


55.6 


58 


6 7 


11 8 


62 


15 R 


3fi 4 


63 


21.4 


42.9 


65 


3.7 


23.3 


R7 


9"} <1 


97 8 


71 


1"} "3 
I CO 


9fi 7 


73 


28 R 


39.1 


80 




56.5 


84 


oo.o 


Of .u 


R<i 


OVJ.O 


R9 


90 


14.3 


22.7 


94 


25.0 


30.0 


95 


16.7 


23.1 


98 


5.9 


11.1 


100 


26.9 


42.9 


103 


16.7 


52.9 


106 


10.0 


18.2 



150 


21.6 


44 4 


153 


25.0 


30.0 


155 


22.6 


36.8 


157 


14.3 


31.8 


158 


22 6 


40 0 


163 


29.6 


37.9 


164 


25.0 


43.8 


173 


17 9 


?R 7 


17R 


20.0 


"3R Q 


177 


21.7 


33.3 


181 


5.0 


21.7 


1RR 


41.2 


59 R 




11.8 


9S 5 


189 


21.4 


31.6 


195 


32.1 


64.7 


?0fi 


33.3 


•if) 0 


211 


30.8 


3 

00.0 


232 


50.0 


57.1 


233 


34.8 


55.2 




57 1 


7D R 


94"3 


4fi 7 


59 Q 


263 


15.4 


35 7 


273 


61.5 


75.0 


97R 


9*3 R 

£.0.0 


44 4 


9QR 




9R fi 


297 


13.3 


9^ 5 


298 


20.0 


22.2 


302 


30.0 


52.2 


304 


33.3 


40.9 


305 


42.1 


70.0 


316 


38.5 


42.9 


318 


7.1 


15.8 



TABLE IV - COMPARISON OF GBSra/m NUMBERING AND SEQ ID NUMBER 



GBS numbering 


Sequence listing 


GBS1 


SEQ ID 3532 & 8736 


GBS2 


SEQ ID 4530 & 8818 


GBS3 


SEQ ID 6266 & 8958 


GBS4 


SEQ ID 2 & 8786 


GBS5 


SEQ ID 2598 & 8674 


GBS6 


SEQ ID 398 & 8496 


GBS7 


SEQ ID 8790 & 9798 


GBS8 


SEQ ID 8694 


GBS9 


SEQ ID 4540 & 8822 


GBS 10 


SEQ ID 8718 


GBS11 


SEQ ID 5884 & 8930 


GBS12 


SEQ ID 8764 & 9692 


GBS13 


SEQ ID 8484 


GBS14 


SEQ ID 5406 & 8892 


GBS15 


SEQ ID 4 & 8710 


GBS16 


SEQ ID 944 & 8538 


GBS17 


SEQ ID 1770 & 8602 


GBS18 


SEQ ID 6860 & 9002 


GBS19 


SEQ ID 4422 & 8812 


GBS20 


SEQ ID 308 & 8488 


GBS21 


SEQ ID 8762 



GBS numbering 


Sequence listing 


GBS345 


SEQ ID 2442 


GBS346 


SEQ ID 2768 


GBS347 


SEQ ID 2766 


GBS348 


SEQ ID 8658 


GBS349 


SEQ ID 2360 


GBS350 


SEQ ID 8698 


GBS351 


SEQ ID 2970 


GBS352 


SEQ ID 8692 


GBS353 


SEQ ID 3454 


GBS354 


SEQ ID 8754 


GBS355 


SEQ ID 8752 


GBS356 


SEQ ID 8724 


GBS357 


SEQ ID 8720 


GBS358 


SEQ ID 3184 


GBS359 


SEQ ID 3948 


GBS360 


SEQ ID 3926 


GBS361 


SEQ ID 8770 


GBS362 


SEQ ID 8768 


GBS363 


SEQ ID 3816 


GBS364 


SEQ ID 1452 


GBS365 


SEQ ID 1398 
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9FO ID R^Rd 


GBS23 


SEQ ID 8512 


GBS24 


SEQ ID 1694 & 8598 

VJL— VX 1 I— ' 1 \J\J^T VA 




SFO in ^180 X 8714 

OLw I LJ O IOU Ok Of I *-+ 




SFO in 8890 


GBS27 


SEQ ID 8774 


GBS28 


SEQ ID 8738 

Wla^X ILJ U 1 VV 


GR99Q 


9FO in R744 


GRS^O 

vJDOUlJ 


SFO ID 8860 


GBS31 


SEQ ID 8702 

n_J I— Vx ILJ VJ I V/£. 


GBS32 


SEQ ID 8910 & 10142 




SFO in 57^4 A 8Q19 




SFO in 5750 A 8Q18 


GBS35 


SEQ ID 8908 

V_J 1 — . Vx 1 LJ UwUU 


GBS36 


SEQ ID 8542 

O t vx, I LJ u \J\£— 


ORS87 


SFO in 8584 

OLW ILJ OJU4 


ORS18 


SFO in 9199 & 8849 


GBS39 


SEQ ID 8480 

OI — Vx, II— / U7UU 


GBS40 


SEQ ID 8654 

OI — V»t 1 1— ' uuot 


GR^I 


9FO ID 117R A fl'iR? 




ccn in 4R c iR X. RR^O 


GRS43 


SFO ID 672 & 8520 

OLUd ILJ U/ t— CX VJOZ.U 


ORS44 


sfo in qooo 

O l vx. ILJ OUUU 




cpn in qoi a 




SFO in 1A^4 A A80A 

OCU ILJ 1004 Oc uDUO 


GRS47 


SFQ ID 8588 

O 1 — W ILJ UvJUU 


GRS48 


ULsx ILJ OvJv7*T vX UwfU 




SFO in R4Q4 & Q4Q0 


uDOOU 


SFO tn 19^8 & A588 

OLU ILJ IZOu Cx OODD 


GBS51 


SFO ID 5410 

VJ I— . "of II—/ \J™ IV/ 


GBS52 


SEQ ID 3920 

Ul— Vx II-/ Uv/t-v/ 


\JDOJO 


qcn in R^RR 

OCU ILJ OvJOtJ 




qrn in "3449 

OLU ILJ J4t^ 


GBS55 


SFO ID 9020 & 10338 

Ol— vx ILJ C/uZ-W vX 1 U JJU 


GBS56 

VJLJOUU 


SEQ ID 2510 & 8668 

O L— Vx. ILJ C-yJ IV/ Sjt UUUU 


ODOJ / 


SFO in 8854 

OLW ILJ OUvJ*t 


ORS5R 


SFO m 8884 
O L w ILJ ouut 


GBS59 


SEQ ID 3744 

V-J l_ VX 1 1— ' Uf I I 


GBS60 


SEQ ID 8760 

vJ^VX 1 LJ U ( \J\J 


GRSR1 


SFO in R77R 

Ol Vx! ILJ \J t I \) 


ORS89 


SFO in 2244 

Ol Vx; ILJ Z./L.^t't 


GBS63 


SEQ ID 390 


GBS64 


SEQ ID 374 

1 — ' 1 — V-K II-/ \J 1 1 


GRSfi'i 


9FO in 8544 

OLW ILJ UO*t*t 


GRSfifi 

UvJUU 


9FO in ^098 

OL_W ILJ OUZ.O 


GBS67 


SEQ ID 3746 


GBS68 

V—J l^J VJV/U 


SEQ ID 4012 


GRSR9 


SFO in 4Q1R 

OL_«oi ILJ *t^7 IU 


GBS70 


9FO in ?718 

0 1 — W ILJ Of IU 


GBS71 


SEQ ID 8906 


GBS72 


SEQ ID 1348 

*— ' L— \J( 11—/ I U~U 


GBS73 


SFO in 99fl 

OLW 1LV 


GBS74 


9FO in RR?? 

OLU. ILJ OO/ ^. 


GBS75 


SEO ID 8926 


GBS76 


SEQ ID 5862 


GBS77 


SEQ ID 3256 


GBS78 


SEQ ID 3262 


GBS79 


SEQ ID 3264 


GBS80 


SEQ ID 8780 



ODOJUU 


9FO in 

O l w ILJ OUI H 


GBS367 


SEQ ID 1340 

VJL. Vj( II—/ 1 *-/T^W 


GBS368 


SEQ ID 1598 


\JDOOU3 


9FO in 4899 

OL.VJ, ILJ 'tOZ.Z. 


VJUOO I U 


SFO ID 8844 

0 1 — V>C ILJ UUtt 


GBS371 


SEQ ID 4926 


GBS372 

V— Jl— 'OU ( £- 


SEQ ID 4956 


ODOO / O 


ccn in 5089 

OCU ILJ UlJuZ. 


i^R9^74 


cpn in 8878 

Ot-vJ. ILJ OO/ O 


GBS375 


SEQ ID 326 


GBS376 

VJUOU i u 


SEQ ID 5380 

Ol— vx II—/ vvUU 


(^R^^77 


9FO in 54R8 

OCW ILJ JtUO 


(^R9^78 

VJUOO 1 U 


SFO in 5570 

0 1 V_xl ILJ \J\J 1 VJ 


GBS379 


SEQ ID 8918 

VjL-VX ILJ U\J IU 


OBS380 


SEQ ID 156 

O l_ Vo< ILJ 1 UU 


ODOOO 1 


SFO in 8Q^4 

OEU ILJ OC/vJt- 


(^R^^89 


SFO in 8810 

OL-W ILJ OU IVJ 


GBS383 


SEQ ID 4738 

ULU ILJ i / UU 


GBS384 


SEQ ID 8836 

OL_VX ILJ UUuv 


ODOuOJ 


'nFO in 1DQ4 
ol^w ilj luy-r 


ODOOOO 


cpn in QmR 


OR^^ft? 


SFO in 8558 

Ol— W ILJ UJJU 


OQOOUU 


SFO ID 9040 

0 1— V_K ILJ 


uDoooy 


SFO in 8^18 

OLl W ILJ OO I D 


oDooyu 


SFO in AQc:9 
oizw ilj oyo/ 


ORS^91 

UUOJC/ 1 


SFO ID 8599 

0 1 Vx ILJ U«Jf- f— 


vJUOOCt. 


SFO ID 6220 

WLVx ILJ (Jf—f—VJ 




ccn in RQRR 
OCU IU Oc/DO 




cpn in RQRn 


CjRS3Q5 


SFO ID 6276 

UL>\x 1 L— / Ub f U 


ORS^QR 


SFO ID 8468 

OL_Vx ILJ UtUU 




cpn in R9R9 


oDuoyo 


"5FO in RR0R 
OCU IU oouo 




SFO ID 1960 

OI_Vx ILJ \\J\j\J 


ORS400 


SEO ID 3154 

O L_ vx 1 LJ \J \ UT^ 


ORC!4(i1 

VJDOHu I 


SFO in "3170 

OLVJ, ILJ O 1 1 \J 




SFO in 49^8 

OL-W ILJ IjCiJU 


GBS403 


SEO ID 8798 

OL— V-K ILJ Ul UU 


GBS404 


SEQ ID 8800 

Ol Vx ILJ UUVJV/ 




SFO in 8508 

OCU ILJ OuVJu 


f5R"54r>R 


SFO in 8508 

OUvx ILJ OuUvJ 


GBS407 


SEQ ID 6484 

Ol— VX ILJ \J^U~ 


GBS408 


SEQ ID 9042 


ftR94DQ 


SFO ID 6678 

0 1 VX ILJ UUi U 


t^R^in 

vjDOt 1 VJ 


SFO in 4084 

OE-vx ILJ *tVj'U*t 


GBS411 


SEQ ID 9044 

vJ 1— VX 1 LJ \s \J 11 


GBS412 


SEQ ID 9046 

Ol— VX ILJ \J\J~\J 


ODOH 1 O 


SFO ID 979 

0 1 — vx ILJ £~ 1 C 


RR9414 

UUOt 1 *T 


SFQ ID 8946 

OL— VX ILJ UviT^U 


GBS415 


SEQ ID 8944 


GBS416 


SEQ ID 6044 

V ' |— vx * ^— ' VJ \J | 1 


OR9417 


SFQ ID 1874 

OUVx ILJ lUl t 


OR941R 

UDOf I O 


SFQ ID 5146 

OL— vx ILJ \J IHU 


GBS419 


SEQ ID 2638 

V I 1 VJ»< 1 1— » tmm Vj \J 


GBS420 


SEQ ID 2104 


GBS421 


SEQ ID 2108 


GBS422 


SEQ ID 714 


GBS423 


SEQ ID 6884 


GBS424 


SEQ ID 4874 
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ORS81 


cpn m ?706 


GBS82 


SEQ ID 2898 


GBS83 


SEQ ID 8772 

vJ 1 Vx I L-> 1 1 t 


GBS84 


SFO in 4189 

Ol— W ID i I Ut 




SFO in 91fi 

Ol— W IU t IU 


GBS86 


SEG ID 2978 

vjl— VX 1 IJ £—<J l VJ 


GBS87 


SEQ ID 3452 

VjL_VX 1 U i t 


ORSRR 


SFO in 5fiQ4 

OLW ID JUO'T 




SFO in 9889 

OuW ID ^Uu^ 


GBS90 


SEQ ID 8476 

i_J 1 VX 1 1 ' U I 1 W 


GBS91 


SEQ ID 8938 


RRS9? 


SFO in 8Q84 A 109*38 




SFO in 9848 

O ID tutu 


GBS94 


SEQ ID 1592 


GBS95 


SEQ ID 2224 


RRSQfi 

ODOjO 


omw lLj ^ iou 




ol-w ilj ouu 


GBS98 


SFO ID 8746 

Ol — W ID Uf *TVJ 


GBS99 

VJ UOC7C? 


SFO ID 4240 

0 1 — VX ID *Tt*"rU 


uDO 1 UU 


oca in 8789 


ODO IU 1 


oca in RQ09 


ORS109 


SFO in 88Q4 


orsio^ 

ODO 1 UO 


sfo in fi 

OL_W ID U 


UDO IU4 


oca m 9.779, 
OtvJ IU 0/ / 0 


ODO 1 UO 


qpn m iaoo 

OtU IU I HUU 


ORS108 

ODO 1 VJw 


SFO in 8609 

Ol — W ID UUUa 


orsio? 

ODO 1 VJ / 


O I — Vk ID UUtU 


ODO 1 UO 


olU IU OOOZ 


ODO i uy 


opn in 41 18 

oCVoi IU t I ID 


ORS1 10 

O DO I I U 


SFO in 88^9 

0 1 VX ID UUut 


GBS1 1 1 


SFO ID RR4? 

0 1 vx ID UUtt 


f^RO-l -IT 
ODO 1 1 <L 


cpn in rqp.4 
ocu iu oyuH 


ODO I 1 O 


ccn in ^nn 

ocu IU ouu 


ODO 1 1 *T 


SFO in 8968 

0 1 vx ID UC7UU 


fiRS115 

VJDvj 1 1 <J 


SFO ID 5164 

Ol VX ID \J IUt 




qcn m R1^9 

OL.U IU U 1 <Jt 


ORS1 17 


SFO in 8Q89 

Ol_W IU OcJDt 


ORS118 

ODO 1 1 U 


SFO in 9608 

0 1 vx ID lUUU 


GBS1 19 

ODO 1 1 v? 


SFO in 8814 

OL-W ID UU 1 *T 


ODO ItU 


SFO in 8874 
OCVJ IU OO ( *t 


ODO \£L I 


ocu IU OOtO 


WUlJ 1 1 1 


SFO ID 9006 

0 1 vx IU tfUUU 


GBS123 

VJ 1 


SFO ID 6310 

OI_Va{ ID UU IU 


ODO 1 


qpn m 9R0 


ORS19*; 

ODO I tv) 


9FO in ^879 

OlU IU OO / t 


GBS126 

vJ UvJ I tU 


SFO ID 67^6 

O L_ Vx ID U( UU 


GBS127 

VJ U VJ 1 A_ 1 


SEQ ID 8816 


ORS198 

ODO | £.0 


SFO in 7^9 

OL_vx IU I Ut 


ORS19Q 

ODO 1 t \J 


SFO in 8QQ0 

OC-Vy ID OCvOU 


GBS130 


SEQ ID 9004 

Ol — Vx ID C(Uvt 


GBS131 


SEO ID 6198 

Ol— Vx ID U 1 iJU 


uuu i 


qpn in 87*30 

OCw IU Or OU 




SFO in 474 


GBSn4 

lull— 'VJ I U" 


SFO in Q008 

Ol— W ID v7uUO 


GBS135 


SEQ ID 8882 


GBS136 


SEQ ID 1188 


GBS137 


SEQ ID 3960 


GBS138 


SEQ ID 9052 


GBS139 


SEQ ID 884 



GBS425 

\J UO"t.v/ 


SEQ ID 3978 

Ol— vx ID JCI U 


GBS426 


SEQ ID 3976 


GBS427 


SEQ ID 6958 


GBS428 

O DOt^.U 


SEQ ID 3398 

Ol — vx ID JOOU 


GBS429 


SEQ ID 340? 

Ol — W ID UtUt 


GBS430 


SEQ ID 8840 

V./L— VX 1 W 


GBS431 


SEQ ID 8902 

\J^— VX II—/ UUVb 


GRS432 


SEQ ID 8534 

Ol— vx ID UUUf 


ORS4^^ 

ODOtUU 


SEQ ID 2558 

0 1_ vx ID LUUU 


GBS434 


SEQ ID 8590 

V^U— vx It-* 


GBS435 


SEQ ID 484 


UQOtUU 


SEQ ID 847? 

0 1 — vx 1 D Ut 1 


GRS437 


SEO ID 466 

OL— vx ID tUU 


GBS438 


SEQ ID 362 


GBS439 


SEQ ID 900 

\*J M— V_X IU WVV 


fiRS440 

O Duttu 


SEQ ID 8536 

Ol — vx ID UUUU 


GBS441 


SEQ ID 936 

OUVx ID \J\JyJ 


GBS442 


SEQ ID 940 

u i— vx i u w i v/ 


GBS443 


SEQ ID 998 


f5RS444 


SFO ID 1776 

0 1 vx ID 1 / / U 


ODOTnJ 


SFO in 88^4 

OI_W ID OUOt 


GBS446 


SEQ ID 2048 

OL— Vx ID tuT^U 


GBS447 


SEQ ID 1654 

vJ 1 — VX IU 1 U\JT 


ODOHH-O 


ccn in R5Q? 


f^R944Q 

ODOH-Hi? 


SFO in 1R^4 


vjuutou 


SEQ ID 1630 

vJL-VX^ IU 1 \JkJ\J 


GBS451 


SEQ ID 2098 

vjl — VX IU i_uu<J 




SFO in 9089 

OLL vx ID tUUt. 


ODOHOO 


SFO in R8?fi 


GBS454 


SEQ ID 1734 

VvL>VX IU 1 I %J^T 


VJ DOt \s \J 


SEQ ID 1690 

VJ L— VX IU 1 vwU 


ODOtUU 


SFO in 1884 

Ol vx ID lUUt 


ORS4S7 


SFO ID R656 

Ol—VX ID DUUU 


GBS458 


SEQ ID 8650 

V/L.VK IU V/V/Vi/V/ 


GBS459 


SEQ ID 2152 

VJ 1— VX 1 U i_ | i- 


ORS4R0 

ODOtUU 


SFO in ?14R 


ORS4R1 


SFO in ?^Q4 

OL— VX ID C.O^D'-r 


GBS462 

VJ UO~vt 


SEQ ID 2778 

VJ^-VX IU LI 1 U 


GBS463 

vj u \ j™ vy v/ 


SEQ ID 8688 

v>* ^— vx iu v/v/v/vj 


RRC4R4 

UDOtUt 


SFO in R6R4 

0 1 Vx ID UUUt 


RRS465 

ODOtUU 


SFO in 868? 

0 1 vx ID UUUt. 


GBS466 


SEQ ID 2694 


GBS467 


SEQ ID 2350 


ORS4R8 


SFO ID 8660 

Ol VX ID UUUU 


GRS469 

ouotuc 


SEQ ID 2998 

OL— vx ID £-\J\J\J 


GBS470 


SEQ ID 2988 


GBS471 


SEQ ID 2924 

vj L— vx i w *— \s *— r 


GRS472 


SEQ ID 2910 

VJI_VX IU £—\J Iv/ 


ORq47'3 

ODOT" 1 \J 


SFQ ID 2882 

OL.VX iu tuut 


GBS474 


SEQ ID 2878 


GBS475 


SEQ ID 2856 


GBS476 

uUut / U 


SEQ ID 8690 

VJ ' VX |L/ UUUU 


GBS477 


SEQ ID 31 12 

vJl—vx iu vj i it 


GBS478 


SEQ ID 3432 


GBS479 


SEQ ID 3460 


GBS480 


SEQ ID 3504 


GBS481 


SEQ ID 8734 


GBS482 


SEQ ID 8740 


GBS483 


SEQ ID 3606 
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uDO 


ocn in RR39 

OEHlol IL/ OUOt 


UDO I *T 1 


SEO ID 1768 

Ol— Vx II—' 1 « OO 


fiR°>149 


SEO ID 8600 

Ol — Vj< IU UUUU 


ODO l*+0 


sfo in Qft^d 


oDo IH-H- 


ccn in 993ft 


ODO 1 *tO 


SEO ID 8700 

OI-Vj( 1 U Ul OO 


ODO I tO 


SEO ID 8696 

O 1 — vx 1 ] / OOOO 


ODO IH/ 


SFO in ft^fi 

OCU IL/ OJ^U 


f5RQ14A 

ODO \HO 


sfo in Q(im 

OL^W \U 3U I v/ 


ODO I 


SEO ID 8732 

0 1 — W IL/ Ul O^- 


ODO I OU 


SEO ID 3736 

Ol — Vo< IL— ' O 1 vJvJ 


ado-i C-l 

ODO 1 O 1 


OUvJ I U O I 00 


ODO 1 OZ 


oca in qoA9 
onu iu oyoz 


ODO I JvJ 


*JLV>( II— 1 OC/0*T 


ODO 1 0*-r 


SEO ID 4024 


ODO 1 00 


ouU IU Of oD 


ODO I 00 


oca |r\ AR4R 

OCU IU f040 


ODO I O / 


°,fo in 4819 


ODO I JO 


SFO in ^04 

OL— 1 U JJUt 


otso ioy 


oca m AR9A 
OCU ILi OOZO 


r«ncH £JA 

ODO lOU 


QCA in AQ9.4 

ocu iu oyz*f 


ODO 1 O 1 


9FO in AQ99 


ODO 1 O^i 


9Fn in ira 

OUVj IU I uo 


ODO IDO 


cca m 99zl 
OtU IU ZZ*f 


ODO 104 


CCA in 11 H9 

ocu iu I iuz 


ODO 1 OO 


9FO. in qR79 


ODO 1 UO 


C.FO in A719 


ODO 1 Of 


cca m 

OCU IU f^ l*t 


ODO 1 OO 


oca m QfHR 
ocu iu yu 1 0 


ODO 1 09 


C.FO in 4?4R 


ODO I 1 U 


CCA m RQA9 


PRC1 7-1 

ODO 1 / 1 


cca m R790. 
OcU IU O/ ZU 


nDQ-l 79 
ODO 1 / Z 


cca m fi704 
OCU IU Dl uh- 


f2RQl7q 

ODO 1 i O 


°»FO in 8788 

0 1 \x IU O/ OO 


fiR9174 

ODO 1 / *t 


SFO ID R150 

O 1 — W IU O 1 w 


pDC-1 7C 

otSo 1 r 0 


oca m fi9 

OCU IU OZ 


PRC') 7C 
ODO 1 / 0 


cca m A47A 
OCU IU OH- f O 


(^R C 1177 
ODO I / r 


SFO in 887fi 

O L_ W IU OO / O 


ODO I / O 


°.FO in fif!7R 

OCVjI IU UUl 0 


odo i / y 


cca m ftft4R 

OCU IU OOH-O 


ODO I OU 


cca m *5nfi9 

OCU lU OUOzl 


ODO 1 O 1 


^FO in 1Q94 

OLW IU I C/-" 


ODO I \j£- 


SFO ID 3774 


ado«i qo 

ODO 1 OO 


qcn in 47QR 


ODO 1 Of 


cpn in iQ7fl 


\jdo 1 oo 


SFO ID 1046 

OL<o( IU 1 


RRS1RR 


9FO in A470 

OL.W IU 0*T/ 


RRC'I D7 
ODO 1 O ( 


oca m A44 
OCU IU OH-H- 


ODo 1 OO 


oca m ^4in 

OCU IU OH 1 \J 


ORS1RQ 

OUO 1 vJC/ 


^FO in RQRfi 

OUU IU UCUv 


OUO 1 Zf\J 


^FO in AR42 
ol»u iu OOtt. 


odo i y i 


oca m 1A14 
OCU IU IOI*t 


ODO I 


oca m ARi ft 
OCU iu 00 1 0 




SFO in 9*382 


GBS194 


SEQID3912 


GBS195 


SEQ ID 8 


GBS196 


SEQ ID 4944 


GBS197 


SEQ ID 5486 


GBS198 


SEQ ID 8896 





SFO ID 3^62 


GBS485 


SEQ ID 3552 


GBS486 


SEQ ID 3762 


nRS4R7 

UDOtu ( 


SFO in 37RR 

O L_ vx IU Ji OO 


ORS488 


SFO ID 3732 

wLU IU O / vJi— 


GBS489 


SEQ ID 3730 


GBS490 


SEQ ID 3704 


^R94Q1 


SFO in 




SFO ID 3252 


GBS493 


SEQ ID 3244 


GBS494 


SEQ ID 3238 

UL.VX 1 1— ' \J£-\J\J 




9FO in R799 

OLW IU Of L-t— 


"JDOHsU 


^FO in R71fi 

OLW IU Ol IO 


GBS497 


SEO ID 3876 

Wl_V>( II—' \J\J l \J 




SEQ ID 3858 


RRS4QQ 


SFO in A7 c ift 

O l_ W IU Of OO 




<^FO in 4099 


RRS'SOI 


SEO ID 4106 

Ul — Vj< IU t 1 WW 


GRS'502 


SEQ ID 1406 


UDOJUJ 


^fo in ft^Rn 


ODOJU4 


'iFO in 4^71^ 

OCU IL/ H-O/ O 




SFO ID 4566 

OL.U IU *TOOO 




SFO ID 8832 

OL_ U IU UUud 


ODOOU / 


cca m AA9D 
OCU IU ooou 


ODOOUo 


oca m 4R44 

OCU IU £ tO £ tH- 


\J UOJuW 


SFO ID 8828 




SFO ID 8826 


ODOO I I 


SFO in 4RQ2 


ODOO I C, 


ocn in <lQ7n 


OPwv 1 O 


SEO ID 4974 

VJI—VK 1 lw/ "Ti/ 1 1 


(T5RCCI-I/1 
ODOO 1 1 


SFO ID 8862 

O I Vk IU UUvl. 


ADOC1 C 

ODOO 1 O 


oca in 8RR4 
OCU iu oout 


ODOO 1 D 


oca m R8RR 

OCU Iu OOOO 




SEO ID 8868 


VJJ LJOJ 1 O 


SEQ ID 9012 

Ol—Vx IU \J\J 1 


f^R^RIQ 


SFO ID "SORR 




ccn ID RR70 
or_u il/ 00/ u 


ODOO£- 1 


SFO ID 5228 


GBS522 


SEQ ID 322 


ODOOZO 


ciFO in 84Q9 

OCU IU \j'-t<j£. 




ocn ID 8RQ4 
ocu 1 u uust 


GBS525 


SEQ ID 5430 


GBS526 


SEQ ID 5414 

WL^Vx i 1— ' V-/ T 1 r 


nR9^97 

O DOJ£ r 


SFO ID 5524 


ODOO^O 


SFO ID 8898 

OCVj< IU OOv/O 


GBS529 


SEQ ID 5670 


GBS530 


SEQ ID 5630 


ODOOO 1 


SFO ID 5588 

O LW IU OOOO 




SFO ID 1324 

\j l_ \j( 1 1— ' 1 %J i— " 


GBS533 


SEQ ID 8914 


GBS'i34 


SEQ ID 8550 


ODOOOO 


SFO ID 85RR 

OCU IL* UdUO 


VjDOJJU 


SFO ID 1288 

UL. vx IU 1 4.UU 


GBS537 


SEQ ID 5798 


GBS538 


SEQ ID 8920 


GBS539 


SEQ ID 158 


GBS540 


SEQ ID 8482 


GBS541 


SEQ ID 184 


GBS542 


SEQ ID 9048 
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GBS199 


SEQ ID 1162 


GBS200 


SEQ ID 8936 


GBS201 


SEQ ID 4550 


GBS202 

V— ' IX VX £- V/ X— 


SEQ ID 8666 

VX 1 — VX 1 L_< \J\J\J\J 


GBS203 


SEO ID 6478 

V-/ L. VX 1 IX U " 1 VJ 


GBS204 


SEQ ID 1996 


GBS205 


SEQ ID 18 


GBS206 

V— / Uvbuu 


SEO ID 8552 

vxl_vx It-r i~)\J\Je— 


GBS207 

V— *IXVJ^-V/ ' 


SEQ ID 3822 


GBS208 


SEQ ID 3916 


GBS209 


SEQ ID 3918 


GBS210 


SEQ ID 3738 

UU> VX 1 LX vJ 1 uu 


GBS211 


SEQ ID 4680 

w^Vx 1 LX i VJ VJ VJ 


GBS212 


SEQ ID 8750 


GBS213 


SEQ ID 8500 


GBS214 


SEQ ID 8498 


GBS215 


SEO ID 9022 

vx 1 — Vx |U CuLt 


GBS216 


SEQ ID 8606 


GBS217 


SEQ ID 9024 

*— ' t> t lx v/ w ^_ r 


GBS218 

<«IUU£. 1 VJ 


SEO ID 8652 

O 1 — Vx 1 LJ U VJ vJ^. 


GBS219 


SEO ID 8646 


GBS220 


SEQ ID 2730 

Vy VX 11—/ ' VJVJ 


GBS221 


SEQ ID 9028 

WUVX 1 LX uu£.U 


GBS222 


SFO in 3849 

oi — Va( i lx \j\j t -r£. 


ORS993 


SFO in ft7QA 

OI — Vx IL* U/ 


GBS224 


SEQ ID 9026 

V— * l — . VX 1 £— / »_/ VJ^, LX 


GBS225 


SEQ ID 8834 

VX 1 — vat 11—/ VJVJvJ" 


ORS996 


SFO in AQfifi 


GBS227 


sfo in ^.rno 

OL_Vj< I Lx \J\J\J\J 


GBS228 


SEQ ID 5050 

V — ■ VX 1 IX t/UtJ^ 


GBS229 


SEQ ID 9056 

V/L>^ IL/ V/V/VjV/ 


GBS230 


SFO ID 1996 

O 1 Vx 1 \—f \ £-<3\j 


GBS231 

VX L— ' > — ' 1— -X 1 


SEQ ID 5810 

v_* I— VX 1 \-J \JKJ \ \J 


GBS232 


SEQ ID 5830 


GBS233 


SEQ ID 4722 


GBS234 

* — / IX VXa— »X i 


SEQ ID 1 106 

\-) 1 — Vx 1 lx 1 1 VJVJ 


GBS235 


SEO ID 8560 


GBS236 


SEQ ID 6162 


GBS237 


SEQ ID 8706 


GBS238 


SEO ID 4246 

VJ L_ VX 11-/ *ttt\J 


GBS239 

VX LX \J 


SEQ ID 8980 


GBS240 


SEQ ID 8986 


GBS241 


SEQ ID 9030 

1 lx x/v/vjvj 


GBS242 


SEQ ID 9032 


GBS243 


SFO in RR7R 

sx L_ Vx 1 1— ' UUl O 


GBS244 


SEQ ID 6554 

v^ 1 — VX I 1— / V w vj~ 


GBS245 


SEQ ID 8994 


GBS246 


SEQ ID 6864 

wU\>( 1 !_/ v/VJvJ™ 


GBS247 


SEO ID 8856 

*JUVs( 1 1— ' UVJvJVJ 


GBS248 


SEQ ID 454 


GBS249 


SEQ ID 8620 


GBS250 


SEQ ID 8634 

"kVli IU VJV/v/~ 


GBS251 


SEQ ID 2258 


GBS252 


SEQ ID 8648 


GBS253 


SEQ ID 2526 


GBS254 


SEQ ID 2710 


GBS255 


SEQ ID 2966 


GBS256 


SEQ ID 3424 


GBS257 


SEQ ID 3550 



GBS543 

V— / IX VX V/~ VJ 


SEQ ID 893? 


GBS544 


SEQ ID 5880 


GBS545 


SEQ ID 44 




9FO in QniA 


GBS547 


sfo in ip 


GBS548 

vy i — * v*/ v r vj 


SEQ ID 8614 


GBS549 

VX UUu~ v 


SEQ ID 8612 




CCD ID 479D 


GBS551 


SEQ ID 471D 


GBS552 


SEQ ID 1086 


GBS553 


SEQ ID 1088 




SFO ID 1 13R 


GBS555 


SEO ID 8748 


GBS556 


SEQ ID 5968 

VJI— Vx II—/ vvUU 


GBS557 


SEQ ID 774 




SFO ID 1199 




SFO ID 1196 

OI — Vx IU/ 1 1 \3\J 


GBS560 


SEQ ID 1268 

> — ' 1 — vx 11-/ 1 i_UU 


GBS561 


SEQ ID 8518 


GRS56? 


SEO ID 8676 

vj 1 VX 1 VJ UU I VJ 


VJJ LJv_>»JU>J 


9FO in 


GBS564 


SEQ ID 2300 


GBS565 


SEO ID 8950 

OI— Vx IU UiJvJU 


RRCSRfi 




ODOOU / 


OL W 11-/ DOU 




sfo in R?on 


ORSSRQ 

VJUOJUC/ 


SFO in RQ'Sfi 

O L_ vx 1 1— ' UCOU 


ODOJ / VJ 


SFO in AQ79 

OL_vJ IL/ Oc?f 


OR9S71 


SFO in RQ7D 

OL_Vx IL/ OC7f U 


GBS572 


SEQ ID 3300 

V— / L— • Vx 1 1— / vwUU 


GBS573 

VJUvul v/ 


SEO ID 3304 

O L— vx 1 L/ JOUt 


RRSS74 


IU Of iZSj 


GR C ?575 


SFO ID R81D 

0 1 — W 11— ' uu iu 


GBS576 


SEO ID 4418 

1 — ' l— vX 1 1— ' I I I U 


GBS577 


SEO ID 8808 

v_' 1 — Vx 1 1 — uuuu 




SFO in 4?R9 

OL.W IL_/ HJOt 




SFO in d^7R 

OL.VX IL* tOf o 


GBS580 


SEO ID 193? 

OI — vx 1 1— ' 1 Ju^ 


GBS581 


SEQ ID 8622 

VX 1 — - VX 11—/ 1 JV// r r 


GBS582 


SFO ID 8694 

0 1 — Vx 11-/ UU^'-h 


GBS583 


SFO ID 1969 

0 1 Vx ILS IuUl 


GBS584 


SEQ ID 8708 

vx vx ii—/ v/ 1 vy v 


GBS585 


SEQ ID 8672 

VX L— VX 11—/ Uv 1 ^_ 




SFO in P>dAA 

O l— Vx ILJ V/*t'T t T 




SFO in RQ7R 

O l Vx IL/ 0\7/ U 


GBS588 


SEQ ID 8804 

UL> VX 11-/ \JKJ\JT 


GBS589 

i — / v./ v/ V-/ 


SEQ ID 8514 

V/l— VX 11—/ UU IT 


GBS5Q0 


SFO ID 8510 

O l Vx II-/ \J\J l\J 


GBS591 


SEQ ID 630 

VX L- VX IL-/ VVV 


GBS592 


SEQ ID 8504 


GBS593 


SEQ ID 514 

VX 1— VX II—/ v_/ | T 


GBS594 


SEQ ID 8978 

VX L— VX 11— ' **J\J 1 U 


GBS595 


SEQ ID 6738 

vJL-vx ■!_/ vy i vJU 


GBS596 


SEQ ID 6712 


GBS597 


SEQ ID 6686 


GBS598 


SEQ ID 6674 


GBS599 


SEQ ID 6662 


GBS600 


SEQ ID 8988 


GBS601 


SEQ ID 8578 



WO 02/34771 



-2996- 



PCT/GB01/04789 



GBS258 


SEQ ID 3752 


GBS259 


SEQ ID 8756 


GBS260 


SEQ ID 4162 


GBS261 


SEQ ID 1530 


GBS262 


SEQ ID 8572 


GBS263 


SEQ ID 1616 


GBS264 


SEQ ID 8824 


GBS265 


SEQ ID 4554 


GBS266 


SEQ ID 4652 


GBS267 


SEQ ID 4980 


GBS268 


SEQ ID 5038 


GBS269 


SEQ ID 5534 


GBS270 


SEQ ID 1998 


GBS271 


SEQ ID 8570 


GBS272 


SEQ ID 22 


GBS273 


SEQ ID 5994 


GBS274 J 


SEQ ID 774 


GBS275 


SEQ ID 2308 


GBS276 


SEQ ID 8942 


GBS277 


SEQ ID 8954 


GBS278 


SEQ ID 8524 


GBS279 


SEQ ID 6292 


GBS280 


SEQ ID 6254 


GBS281 


SEQ ID 4458 


GBS282 


SEQ ID 4444 


GBS283 


SEQ ID 9034 


GBS284 


SEQ ID 6456 & 8974 


GBS285 


SEQ ID 8802 


GBS286 


SEQ ID 9036 


GBS287 


SEQ ID 5354 


GBS288 


SEQ ID 5374 


GBS289 


SEQ ID 8616 


GBS290 


SEQ ID 8680 


GBS291 


SEQ ID 8530 


GBS292 


SEQ ID 8998 


GBS293 


SEQ ID 8582 


GBS294 


SEQ ID 8604 


GBS295 


SEQ ID 2722 


GBS296 


SEQ ID 2658 


GBS297 


SEQ ID 3024 


GBS298 


SEQ ID 8704 


GBS299 


SEQ ID 3268 


GBS300 


SEQ ID 4170 


GBS301 


SEQ ID 8576 


GBS302 


SEQ ID 8670 


GBS303 


SEQ ID 8554 


GBS304 


SEQ ID 5846 


GBS305 


SEQ ID 208 


GBS306 


SEQ ID 212 


GBS307 


SEQ ID 8992 


GBS308 


SEQ ID 8880 


GBS309 


SEQ ID 3386 


GBS310 


SEQ ID 286 


GBS311 


SEQ ID 3964 


GBS312 


SEQ ID 4660 


GBS313 


SEQ ID 4090 


GBS314 


SEQ ID 8556 


GBS315 


SEQ ID 1766 


GBS316 


SEQ ID 2000 



GBS602 


SEQ ID 8948 


GBS603 


SEQ ID 6132 


GBS604 


SEQ ID 5282 


GBS605 


SEQ ID 5302 


GBS606 


SEQ ID 8884 


GBS607 


SEQ ID 5314 


GBS608 


SEQ ID 8886 


GBS609 


SEQ ID 8888 


GBS610 


SEQ ID 8890 


GBS611 


SEQ ID 6028 


GBS612 


SEQ ID 8474 


GBS613 


SEQ ID 5092 


GBS614 


SEQ ID 8872 


GBS615 


SEQ ID 6052 


GBS616 


SEQ ID 8940 


GBS617 


SEQ ID 1824 


GBS618 


SEQ ID 6600 


GBS619 


SEQ ID 6608 


GBS620 


SEQ ID 6620 


GBS621 


SEQ ID 864 


L GBS622 


SEQ ID 8640 


GBS623 


SEQ ID 8996 


GBS624 


SEQ ID 9050 


GBS625 


SEQ ID 2812 


GBS626 


SEQ ID 8858 


GBS627 


SEQ ID 8852 


GBS628 


SEQ ID 8784 


GBS629 


SEQ ID 6950 


GBS630 


SEQ ID 4502 


GBS631 


SEQ ID 4492 


GBS632 


SEQ ID 4488 


GBS633 


SEQ ID 8728 


GBS634 


SEQ ID 3066 


GBS635 


SEQ ID 8838 


GBS636 


SEQ ID 4772 


GBS637 


SEQ ID 8626 


GBS638 


SEQ ID 8984 


GBS639 


SEQ ID 8546 


GBS640 


SEQ ID 6780 


GBS641 


SEQ ID 900 


GBS642 


1312 


GBS643 


1772 


GBS644 


1956 


GBS645 


2726 


GBS646 


3348 


GBS647 


3770 


GBS648 


4934 


GBS649 


5076 


GBS650 


5446 


GBS651 


5602 


GBS652 


5610 


GBS653 


5760 


GBS654 


6096 


GBS655 


6656 


GBS656 


9324 


GBS657 


10782 


GBS658 


8802 


GBS659 


9344 


GBS660 


9410 
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vJ UOu I ( 


^FO ID 4910 


GBS318 


SEQ ID 8548 

VJl— \K II—* UWTJ 


GBS319 


SEQ ID 892 






GRS321 


OL W IU OOt-vJ 


GBS322 


SEQ ID 8540 


GBS323 


SEQ ID 2102 




9FO in ftAQD 




sfo in ftQnn 

Ol — W 1 1— J OfUU 


GBS326 


SEQ ID 8630 


GBS327 


SEQ ID 5856 


ODOOZ.O 


OL W IU UU IU 




9FO in ftQPft 


GBS330 


SEQ ID 8792 

0 1 — \M ILJ (Ji \Jc- 


GBS331 


SEO ID 922 

OUU 11—/ <JL-C 


uDOOOl 


•^FO in 10(14 
OL W ILJ iuut 


VJDOOOJ 


9FO in 17ftfi 

OL W IU I ( OU 


GBS334 


SEQ ID 1784 

Ol — W ILJ 1 1 (J*T 


GBS335 


SEQ ID 1782 


UDOOOU 


^fo in iftftfi 

OL. W 1 U IOOU 


(^00007 


ocn m oom 


uuooou 


9fo in ftfi^ft 

OLVot ILJ OUOu 


GBS339 


SEQ ID 2080 


GBS340 


SEQ ID 8594 & 8596 


GBS341 


SEQ ID 2280 


GBS342 


SEQ ID 2266 


GBS343 


SEQ ID 8644 


GBS344 


SEQ ID 8662 



VjDOUU 1 


Q498 


GBS662 


9286 


GBS663 


9294 






vjPvJUUJ 


1054R 


GBS666 


10610 


GBS6R7 


905? 


UDODQO 


QO^fi 
yuou 


ODOUU3 








GBS671 


9020 


uDOU ' Z. 




f*R9fi7^ 

UDOU > O 




GBS674 


9034 


GBS675 


10634 


UDOU / O 




I^R^R?? 

UDuDI f 




GBSR78 


9330 


GBSR7Q 


9404 


(^RCfiQfl 
ODoOOU 


DDDO 


bDoDtJ 1 








GBS683 


9290 


GBS684 


9614 


GBS685 


10454 


GBS686 


2774 


GBS687 


4620 


GBS688 


10224 



TABLE V - NUCLEOTIDES DELETED IN EXPRESSION OF GBS/mn PROTEINS 



GBS 


Deleted nucleotides 


11d 


1-153 


31d 


1-129 


64d 


1-165 


68d 


2029-2796 


70d 


1-402 


74d 


1-975 


79d 


1-201 


105dN 


2689-4119 


105dC 


1-2688 j 


105d 


1-2688 


109d 


1-120 


130d 


1-518 


170d 


1-111 


182d 


1596-1674 


195C 


1-1710 


195N 


1711-3243 


209d 


757-912 


21 Od 


1-99 & 777-879 


220d 


1-120 


231 d 


1-54 


235d 


1-270 


246d 


1-75 


248d 


1-591 



GBS 


Deleted nucleotides 


272d 


1-531 


277d 


1-318 


281 d 


1-54 


287d 


1-108 


288d 


1-72 


293C 


1-1229 


293N 


1230-2379 


317N 


1729-4107 


31 7C 


1-2379 


326N 


1707-2652 


326dN 


2326-3927 


327N 


3034-6831 


327C j 


1-3033 


333d 


1-150 


339d 


1-111 


352d 


1-158 


362N 


1707-2652 


362C 


1-1706 


397d 


1-348 


399d 


1-111 


407d 


1174-1473 


409d 


1-297 


424d 


1327-1671 
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TABLE VI - PREDICTED FUNCTIONS FOR CERTAIN SEQ IDs 



SEQ ID 


FunntifJn 

1 Ul Iwllvl 1 


« 


mannanoco ARp froncnnrfpr ATP-hinrfinn nrnfofn f"nQoR\ 
1 1 Id 1 lydi Icoc nDU U dl Ibpui LCI , /A l r ~IJI1 lull iy pi uici 1 1 ^pbdu j 


19 


II Ul I l^ol IcIdLcU ) nDO LI dl IbUUI Lei , pel 1 1 Icdoc pi Ulcli 1 yfJodO } 


18 


nontirh/l-nrnK/l pic_tranc icnmpr^Qo pvplnnhilin-twnp 
pepuuy i-pi (jiy i L»io Lidiio loui i ici aoc t oyoiupi iiiii i iy uc 


9fi 


r*hf"\ricmof o hinHinn on7\/mo ^r>ahR\ 
\j\ loi i idLc uiiiuiiiy ciiAynio ^auDj 




(JIUUaUIC LI dl lofJUodoc \\\ lbcl UUI 1 ocLjucI IL.C lOOO 1 ^ 


49 


npntiHace* M90/M9 C I/M40 famiK/ 
pepuudoc, ivi^u/ivi^i.o/iviH'U idiiuiy 




UIUU, LldllbpUILcM 


^o 


IIUUbUHIdl piULclll 1-1 1 ^ipir\J 


^4 


riKncnmal r»rr\foin 1 1 frnlAA 
I IUUoUI I Idl UIULCII 1 ui ijpi/A^ 


R9 


pcpUUc /ADO Lldl lopUf Lcl , peiilicdbc piULClll 


DO 


pcpiluc /ADO Lldl loLJUl Lcl , pel 1 llcdbc piUltJII 1 


7ft 


i iriHv/latd Unoco / r\\/rl— 1\ 
UIIUyldLc Kllldbc ^pyinj 


ft4 

OH- 


IlUUbUlllo ic(jyvjllliy IdtAUI Mil; 


104 

IUH- 


Dh/~\W fomilw r^iwtoin /r\hnl— 1^ 

rjiun idiiuiy piuLciii ^pnun^ 


1 10 

I I u 


IVIUL 1 / 1 IUUIa Idllllly piULclll bUpcildMllly 


1 1fi 
I IO 


leLracenornycin poiyKeiiae syninesis u-iTietnyuransTerdse i cmr 


1*34 


pnospnopdnieineine aaenyiyiirdnsTerdSe ^coauj 


140 
IH-U 


ruz. aoiTidin proiein 


144 


o-nucicoiicidSe Tdrnuy proxein 


1 00 


vanzr-reidiea proLcin 


I0o 


AtsLr iransporter, ai r-Dinaing/permease proiein 


1R0 
lOZ 


adi^ iransponer, ai r-uinaing/permease proiein 


I DO 


Dioi TdiTiiiy proxein 


iro 
lou 


aceiyi-oOA aceiyiirdnsTerdse 


loo 


enuonuciease in \rnnj 




/■fit i^A^mooo / 

giuouKinase vyKij 


?no 


rhnrianfaco fomil\/ nrr\foin 
1 1 luudi it^ot; idiiuiy piuitjni 


904 


eionganon Tacior i u Tdmiiy pruiein \iypM; 


919 
Z I Z 


uu"-iN-aceLyigiuuubdriiiiic--iN-dL.cLyiiiiui pyiupiiubpiiuiyi- 


91ft 
Z i o 


ceil uivision proiein uivid 


990 


r*oll Hi\/icion nrntoin PtcA rftcA^ 

LfCIl Ul VlblU) 1 piUlGMI nLo/A ^lib/Ay 


994 

ZZ*t 


Ocll UlVlblUIl piULcill] rlbZ. ULbZ.^ 


ZOO 


wlml— 1 nrntflin / \/lmW\ 

y ii i in piumiii \yinin^ 


940 


i->j-ill ri i\/ioion nr^\toiri l~li\/l\/ A / Hi\/I\/ A\ 
OtJU UlVlblUIl piULcill UlVIVrt ^vJIVIVMJ 


944 


ibLiicuoyi Lr\iN/A oyim icidoc y i it? o j 


9^9 
zoz 


IVIUL 1 / 1 IUUIa Idllllly piULcill 


ZOO 


/A 1 r "UcptJI lUct IL wip piUlcdbc, /A I r -Ull lUlliy bUUUIUL OipC ^L-ipC.^ 


<coo 


1 1 mil lyitJi ifcJLtJu di lyuiuiuidic uci iyui uy cf idbt;/i iiuii it?i lyiicii di iyui uiuidit? oyuiui i 


274 


PYorlpo.Y\/rihnni irlpfl^p lamp ^iihunit fx^pA^ 




PYnHpnY\/rihnni ipIpocp \/|| email cnhnnit ^y^pR^ 

CAUUCUAy 1 IUUI lUUlCdbC Vll, Ollldll bUUUIML IjAOCU J 


9ft9 


noran\/ltranctrancfciraco ficnA^ 
yci di iy hi di tbit di loici doc ijbp/Aj 


9ftfi 

Z.UU 


homnl\/cin A 
i let i luiybii i /a 


290 


tran^rrintinnpl rpnrp^nr 

LI dl IOv_»l lULlUI Idl 1 Cpi COOU 1 


9QR 


I^MA rpnoir nmtoin Ror'Kl /ror*M^ 

L-/IM/A ICpdll piULCIII i\COIN ^ICL»IN^ 




rfort\/ fomiK/ nmtoin /rlon\/A 
ucy v idi i my pi ulcm i \ucy v ) 


\Jc-C 


npntiHp ARO tran^nnr+pr nprmpaQP nrntpin fnnnO^ 
pcpiiuc /ado ii di ibpui lei , pel 1 1 icdbc pi ulcii i ppv> ) 


326 


npntirip ABO tran^nnrtpr ATP-bindina orotein (onnD^ 


328 


peptide ABC transporter, ATP-binding protein (oppF) 


348 


4-diphosphocytidyl-2C-methyl-D-erythritol kinase (ispE) 


352 


adc operon repressor AdcR (adcR) 


356 


zinc ABC transporter, ATP-binding protein (adcC) 


370 


tyrosyl-tRNA synthetase (tyrS) 


374 


penicillin-binding protein 1B (pbplB) 


378 


DNA-directed RNA polymerase, beta subunit (rpoB) 


382 


dna-directed ma polymerase beta' chain 
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390 


comnptpnop nrotpin OalA imlA^ 


406 


acetate kinass (ackA) 


410 


transcriptional regulator 


418 


nvrrolinp-'S-rarhoYvlatp rpHnrta^p fnrof^ 


422 


oh itamvl-aminonpntiria^p inpnA\ 

L-JI vJ Ldl 1 1 y IOI llll IVJLJCr|-'llvJdOvs \ \J\D\Jr\ ) 


432 


thioredoxin family protein 


436 


tRNA binding domain protein (pheT) 


440 


mpthvltran^fprasp 
■ 1 icu 1 y 111 cii 10 1 C7i doc? 


442 


sinale-^trand DIMA-hindino nrotpin authentic noint mutation fssbB^ 


454 


GAF domain protein (lytS) 


466 


IrgB protein (IrgB) 


474 


olinonpntidp ARC tran^norfpr nprrnpa^p nrotpin 

wiiy wj_»v^L/iivj\_» nuv 11 cii iojjvji ivvi , pvi 1 1 luaou ^iv/iwiii 


476 


npntidp ARC tran^nortpr ATP-hindino nrotpin 

|JC/ tJLIV-l<_> I \ 1— ' V-/ LI Ci 1 lufJUl Ivl ) / \ 1 1 kJl 1 1 VJ 1 1 \J\\J lul 1 1 


480 


peptide ABC transporter, ATP-binding protein (oppF) 


484 


PTS system, IIABC components (treB) 


488 


alnha amvla^p familv nrotpin rtrpCV\ 

CI1LJI ICl C41 1 1 y luot/ 1 CII I Illy yl\ \J\Sj\\ 1 I LI \j V_y 1 


494 


tran^nrintional rpoulator RnlG familv 


506 


transcriptional regulator, BgIG family 


508 


PTS svstem IIB comoonent 

1 1 \J J * ' 1 ' ' I—' wwl 1 luwi 1 v> 1 IL 


514 


PTS ^vstpm IIC oomnonpnt 

1 1 vJ OyOlCI II, II w vAJI 1 ILJUI ICIIL 


518 


tran^kptola^p Nl-tprminal ^uhiinit t'tktA^ 

Ll Ol iOI\v3 Lvl CIOC» , 1 N LCI 1 1 111 1C1I OUUUI ML ^ LrXLrAy 


528 


ribosomal orotein S15 (YdsO) 


546 


ov^tpinvl-tRNA <%vnthpta^p i'ov^S^ 

uyOlull iy 1 LI XI iri Oj 1 1 LI IvluOvy \ v/y O VJ / 


554 


RMA mAthvltrancyffaraop TrmH familv nrotin ^ 
1 \i N/ \ 1 1 icu iy in cii loici ctoc, 1 1 1 1 11 1 idi 1 my , y i uu^j «j 




ripn\/ familv nrotpin (fip>n\/\ 

UgLJ V 1 ell 1 lily yjl «-J LC 1 1 1 ^UCv-J v ) 


572 


riho^omal nrotpin frn^h 

1 iuvJOvji 1 icii |jiuidii <Jw yijjoiy 


576 


intpnra^p nhaop familv 

■ 1 1 itivji uovy <j |_/i iov^w 1 ci 1 1 iny 


580 


tranctprintinnal rpniilator 
li di loLfi iljlivJI idi 1 cyuicnui 


' v > W 


recomhinatinn nrotpin 

1 CLAJl MUUIdllUII \J \ UlCll 1 


626 


transcrintional reaulator MutR 

LI CII Iwwl l|a/llVsl (Ctl I WVJ VJ Id IVJ 1 IVIUll \ 


630 


transrjorter 

Ll CII IvUvl lvl 


640 


amino arid ARC tran^nnrtpr nprmpa^p nrotpin fonllRR^ 

Ctl I III 1 v/ dv/lvJ nUvv LI CII lOJJvJI LCI , \J\j\ 1 1 iC>C50C/ KJ LV71 1 1 ^ U^UUU^ 


642 


olvrinp hptainp/l -nrolinp tran^nort ATP hiniiino ^nhiinit ( nro\/^ 

L^lyLrll IvJ UC LCII 1 IC/ I— \J\ Ulll IC LICIIIOLJLJIL ' \ I I Ull ILlIt IL^ OUUUI 111 \ \Jl \J v 1 


654 


lectin, alpha subunit precursor 


662 


transcriptional regulator 




arptvltran^ifpra^p fnNAT familv 

CtVvCLy 1 VI Ol lOl vJI QOC , \J 1 \l \ \ 1 Ctl 1 Illy 


fififi 

\J\J\J 


arptvltran^fpra^p QNAT familv (rim W 

CtUCLyiLi Ctl lOl d OOC, Idllllly ^IIIIIU/ 


670 


acetyltransferase, GNAT family 


676 


transcriptional regulator, tetR family domain protein 


680 


ARC transoortpr efflux nrotpin DrrB familv 

f\ L—/ LI CII IO|JVI lul ul 1 lUA \Jl K\y 1 1 1 , IS | I ICII 1 Illy 


690 


IS1381 tran^nosasp OrfA/OrfB truncation 

IV/ 1 W 1 , LI CII 1 0 \J V_/ OUOO \J\ 1 1 <J V_/ I 1 U T LI l_l 1 luU Li"— '1 1 


714 


magnesium transporter, CorA family 


718 


oxidoreductase Gfo/ldh/MocA familv 

^y/\i V#l w 1 vrU UUltluv , >t —J 1 \sl 1 vl 1 If IV 1 \J w/ \ 1 C4 1 I iiiy 


799 


valvl-tRNA ^vnthpta^p ^val^^ 
vciiyi li \i N/v oyi ili iciacc ^vciio/ 


1 \J\J 


acptvltran^ifpracip f^WAT famil\/ 

CtLvCLylLI CII IOICI OoC, \J 1 Hr\ 1 Idllllly 


746 


methyltransf erase 


750 


hacterioohaae L54a intporasp 


754 


nNA-damaop-indiirihlp nrotpin .1 

LbMNi* vjoi i iciyt/ n iv-iciv/iuiw l/iuicii i \j 


774 


ration pffli iy ^v^tpm nrotpin 


778 


oxidoreductase, aldo/keto reductase family 


784 


alcohol dehydrogenase, zinc-containing 


790 


3-oxoadiDatp enol-lartonp hvdrolaseM-carboxvmuconolactone dpcarboxvlas 

\J CI \A 1 L^CILw wl IV-/ 1 luulvl lu 1 1 y vl 1 VlCIOvl ■ **r\-*f\y | | |UuV 1 IV/ICIV/LV^I 1 w VI V^ WCII UL/AJ luu 


804 


rihonuolposidp-riinhocinhatp rpdurtase aloha subunit i'nrdF^ 

1 IL/VI luvrlv<\JulUu V-ll|_/| IWOLvt ICllu 1 CUUUlvlw*'! "If-'IIVJ uUk/UI III \IIIV!L_^ 


808 


nrdl protein (nrdl) 


812 


Ribonucleotide reductases 


824 


elaA protein (elaA) 


828 


RNA methyltransferase. TrmA familv 


832 


RecX family protein 


840 


-identity (jag) 
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844 


membrane protein, 60 kDa (yidC) 


856 


UTP-glucose-1 -phosphate uridylyltransferase (galU) 


864 


rhomboid family protein 


884 


MORN motif family 


892 


transcriptional regulator 


896 


adenylosuccinate lyase (purB) 


908 


phosphoribosylaminoimidazole carboxylase, catalytic subunit (purE) 


912 


phosphoribosylamine-glycine ligase (purD) 


916 


phosphosugar-binding transcriptional regulator 


920 


acetyl xylan esterase 


922 


ROK family protein (gki) 


926 


N-acetylneuraminate lyase (nanA) 


936 


sugar ABC transporter, permease protein 


940 


sugar ABC transporter, permease protein (msmF) 


952 


LysM domain protein, authentic frameshift 


956 


zoocin A endopeptidase 


958 


phosphoribosylaminoimidazolecarboxamideformyltransferase/IMP cyclohydr 


962 


acetyltransferase, GNAT family family 


964 


phosphoribosylglycinamide formyltransferase (purN) 


968 


phosphoribosylformylglycinamidine cyclo-ligase (purM) 


972 


amidophosphoribosyltransferase (purF) 


980 


phosphoribosylformylglycinamidine synthase 


984 


phosphoribosylaminoimidazole-succinocarboxamide synthase (purC) 


1042 


oligoendopeptidase F (pepF) 


1060 


ebsC protein 


1068 


hydrolase, haloacid dehalogenase-like family 


1076 


riboflavin synthase, beta subunit (ribH) 


1082 


riboflavin biosynthesis protein RibD (ribD) 


1086 


Mn2+/Fe2+ transporter, NRAMP family 


1094 


peptidase, U32 family 


1116 


HPr(Ser) kinase/phosphatase (hprK) 


1130 


oxidoreductase 


1148 


signal recognition particle-docking protein FtsY (ftsY) 


1152 


Cof family protein 


1156 


Cof family protein 


1172 


vicX protein (vicX) 


1176 


sensory box sensor histidine kinase (vicK) 


1180 


DNA-binding response regulator (vicR) 


1184 


amino acid ABC transporter, ATP-binding protein 


1188 


amino acid ABC transporter, amino acid-binding protein (fliY) 


1192 


amino acid ABC transporter, permease protein 


1196 


amino acid ABC transporter, permease protein 


1208 


DNA-binding response regulator (vicR) 


1210 


threonyl-tRNA synthetase (thrS) 


1214 


glycosyl transferase, qroup 1 


1218 


glycosyl transferase, group 1 (cpoA) 


1222 


alpha-amylase (amy) 


1230 


proline dipeptidase (pepQ) 


1238 


haloacid dehaloqenase-like hydrolase superfamily 


1244 


mannonate dehydratase (uxuA) 


1248 


glucuronate isomerase 


1254 


transcriptional regulator, GntR family 


1268 


sodiumqalactoside symporter family protein 


1270 


D-isomer specific 2-hydroxyacid dehydrogenase family protein 


1282 


transcriptional regulator, LysR family 


1290 


ABC transporter, ATP-binding protein (potA) 


1296 


DedA family protein 


1308 


MutT/nudix family protein family 


1310 


phosphoserine phosphatase SerB (serB) 



